Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

Latest commit

 

History

History
260 lines (169 loc) · 14.4 KB

README.md

File metadata and controls

260 lines (169 loc) · 14.4 KB

A Machine-readable Server Identity and Purpose Descriptor, and mechanisms for delivering a low-entropy signal indicating user consent.

Mike O'Neill, February 2019

Contributors:

Web pages often contain many, sometimes hundreds, of "third-party" components that initiate transactions with servers other than those managed by the top-level website. These "third-party" servers can access storage in the user's device or browser, collect personal data, and link it to data from other sources. The user is usually completely unaware of this.

Unfortunately, there is no recognised standard way for web servers to declare this information i.e. to deliver information that allow users to identify the entities, what their purpose(s) for data collection are (if any), who they share it with, how long they keep it etc.

There is increasing legal pressure around the world for websites to at least declare their use of data collection procedures, explain how they intend to use the data, or what their legal basis is. In many jurisdictions users must be given the opportunity to give or withdraw their agreement to storage access or personal data collection, and offered the right to have any previously collected data deleted.

In addition, user agents have implemented procedures that in some circumstances block particular third-party elements or restrict their ability to access cookies. Some of these sub-resources may be managed by the same entity managing the top-level site, or have previously been given explicit consent by the user. A machine-readable mechanism to record and communicate this could be useful.

The following is a possible JSON encoding that can deliver the required machine-readable information so that a user agent can make it accessible by the user in an standardised and easily digestible way, and to act on user specified preferences.

The information would be obtained by sending a secure HTTP GET to the resource /.well-known/privacy-declaration relative to any origin. For example the data declaration for the domain www.bigco.com would be at https://www.bigco.com/.well-known/privacy-declaration/ and return a JSON document with the Content-Type "application/privacy-declaration+json". Alternatively the objects defined here could be incorporated in an Origin Policy manifest "Origin Policy" to minimise the number of round-trips required when accessing a resource.

User agents or script could automatically parse the information as a JavaScript object at a standard location e.g. navigator.privacyDeclaration, which could then be used to display human-readable information to users. First-party sites could ensure this was always available by using an open source JavaScript library, and to support this the privacy-declaration resource should support CORS (Cross-origin resource sharing), so it can be accessed via the appropriate cross-origin fetch or XHR.

JavaScript can examine the JSON encoded for the first-party then use the otherParties and sameParties arrays to fetch the correct privacy-declaration JSON resources from them (made possible because the third-party resources are CORS enabled). The sameParties set of domains could identify sub-resources which can be trusted as "first-party" because they are managed by the entity that manages the top-level site. User agents can check that each origin in a set are referenced by the other origins by their own privacy-declaration resource, i.e. that they all contain exactly the same "sameParties" set. It may be possible for top-level or parent documents to host external privacy-declarations as bundles of "Signed HTTP Exchanges", which would avoid user agents having to make extra round-trips to get them. See @mikewest's proposal for this in "First-Party Sets".

Other methods are possible to ensure that domains are related, for example there could be a link to information in TLS certificate or domain name registrar's whois entry. There is also ongoing discussion about using DNS records to associate relatedness between domain names.

The privacy-declaration resource could be dynamically generated so that some properties could reflect different user agent states derived from the incoming HTTP Request. For example, the server would examine incoming cookies or other headers in order to calculate the correct value of the "consented" property, or the length of time before consent expires.

Conveying User Agent Registered User Consent.

There should be some standardisation of a low-entropy client originated signal, which could be an existing request header in widespread use like DNT, a new request header designed to be a better fit with European ePrivacy and data protection law, or a specific cookie name such at the IAB EU's euconsent cookie. Another avenue could maybe be explored by extending the cookie "prefix" options described in "Cookies: HTTP State Management Mechanism draft-ietf-httpbis-rfc6265bis-02". For example here is a way to encode a consent indication cookie:

Set-Cookie: __Consent-eu=1,5,6; SameSite=Strict; Expires=Sun, 06 Nov 2019 08:49:37 GMT

TThe cookie has the SameSite attribute set to Strict so it is restricted to the top-level site, i.e. it can only signal site-specific consent. Using a prefix could allow for recognition and then "special treatment" for low-entropy "consent indication" cookies by user agents. For example User Agents could restrict the scope of such cookies to the context of a top-level origin, so all or specified embedded origins on a particular site could receive "site-specific" consent indications.

The site-specific delivery of consent cookies is impossible without explicit browser or browser extension support,
so another method should be standardised so servers can deliver the functionality themselves. The IABEU's TCF proposes a templating system in order to deliver consent information within the request url i.e. as an appended query parameter. This has been forced on them by the increasing restrictions placed by some browsers, e.g. Safari and Firefox, on the use of third-party cookies, but a low-entropy version of this approach would allow for site-specific consent within the web sites that support the functionality.

Root properties

Property Type Description
name String Recognisable & unique entity name e.g. "Google Inc."
policy String(Uri) Human readable HTML page explaining the entity’s privacy policy
storagePolicy String(Uri) Human readable HTML page explaining the terminal storage policy
about String(Uri) Human readable HTML page describing the entity
deleteData String(Uri) A HTTP POST will cause all user agent data for this origin to be deleted, e.g. Clear-Site-Data header could be returned
mayCollect Boolean "false" declares that no data is collected, "true" if it may be collected
mayShare Boolean "false" declares no data will be shared with other entities
mayCombine Boolean "false" declares that data is not combined or linked with data from other sources
purposes Array of PurposeType Objects Lists all the purpose for which data is collected
storage Array of StorageType Objects Lists the terminal storage items that may be utilised
otherParties Array of Strings Lists the third-party domains of embedded resources that may appear on this page
sameParties Array of Strings Lists the first-party domains of embedded resources, i.e. those managed by the same entity, that may appear on this page

The user can give their agreement for zero or more purposes. The purposeType Object for a particular purpose includes a Boolean consented which can be dynamically derived from the incoming HTTP request headers (e.g. cookies).

The storage objects are linked to the specific purposes which they are designed to implement. This gives user agents fine grained ability to restrict storage use to the purposes a user has agreed to.

A browser, browser extension or script executing in the top-level browsing context can use the otherParties and sameParties array to fetch the Descriptors for those domain origins (by fetching the resource at https://{domain name}/.well-known/privacy-declaration.

StorageType Object properties

Property Type Description
type String Storage Type, one of "cookie", "local" (localStorage), "indexed" (indexedDB), "cache" (ETag)
name String Cookie name prefix, localStorage item name, or indexedDB table
purposeList Array of Integer List of ordinal values of entries in the "purposes" array. e.g. [0,1] indicates the first and second purpose type is supported by this Storage Type

PurposeType Object properties

Property Type Description
name String Short identifying label for this purpose
description String A human readable text clearly describing this purpose in the appropriate language
maxRetainedFor Integer Number of seconds data is retained after collection
expiresIn Integer Number of seconds remaining before collected data is deleted
consented Boolean Dynamic indication of registered user agreement for this purpose

An example of the encoding.

{

"name": "BigCo Inc",

"policy": "https://www.bigco.com/privacy.html",

"storagePolicy": "https://www.bigco.com/cookie.html",

"about": "https://www.bigco.com/about",

"mayCollect": "true",

"mayShare": "true",

"mayCombine": "false",

"purposes": [

{

   "name": "behavioural advertising",

   "description": "compiling history of web sites visited",

   "maxRetainedFor": "1000000",
 
   "expiresIn": "45667",

   "consented": "false"

},

{

   "name": "website analytics",

   "description": "web audience measurement",

   "maxRetainedFor": "10000",
 
   "expiresIn": "3456",

   "consented": "false"

},

{
 
   "name": "authentication",

   "description": "logging in",

   "maxRetainedFor": "1000000",
 
   "expiresIn": "67854",

   "consented": "false"
}

],

"storage": [

{

   "type": "cookie",
  
   "name": "_ga",

   "purposeList": ["0","1"]

},

{

"type": "cookie",

"name" "user",

"purposeList": ["2"]

},

{

    "type": "local",

    "name" "dataname",

    "purposeList": ["0"]

}

],

"otherParties": [
    "[www.google.com]",
    "[www.google-analytis.com]",
    "adnxs.com"
    ],

"sameParties": [
    "ourcdn.com"
    ]

}

Prior Art