From c7919d5311af54fcafb5a68a9bb1bfab55a8c089 Mon Sep 17 00:00:00 2001 From: Yao Xiao Date: Wed, 11 Sep 2024 17:37:05 -0400 Subject: [PATCH 1/3] [spec] Make the epoch introduction delay per-site per-epoch This increases user privacy by making it more challenging to identify a specific user based on the time between two epoch switches. The typical duration between switches will be 7 days for active users. However, this duration can be longer if the browser is not running at the originally scheduled time, becoming effectively a user-specific value. --- spec.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec.bs b/spec.bs index babb452..00c4ffb 100644 --- a/spec.bs +++ b/spec.bs @@ -238,7 +238,7 @@ spec: html; urlPrefix: https://www.rfc-editor.org/rfc/ Given a [=list=] of [=topics history entries=] historyEntriesForUserTopics, the browser should provide an algorithm to derive top 5 topics, that are believed to be valuable for the Topics callers. The algorithm should return a [=list=] of 5 [=topic ids=].
- In Chrome versions M122 and later, topics are scored for ranking first by a binary priority level (see topics-utility-buckets-v1.md), and then by the frequency of page loads with that topic. + In Chrome versions M122 and later, topics are scored for ranking first by a binary priority level (see topics-utility-buckets-v1.md), and then by the frequency of page loads with that topic.
@@ -359,7 +359,7 @@ spec: html; urlPrefix: https://www.rfc-editor.org/rfc/ 1. If |epochs| is empty, then return an empty [=list=]. 1. Let |numEpochs| be |epochs|'s [=list/size=]. 1. Let |lastEpochTime| be |epochs|[|numEpochs| − 1]'s [=epoch/time=]. - 1. Let |epochSwitchTimeDecisionMessageArray| be the concatenation of "epoch-switch-time-decision|" and |callerContext|'s [=topics caller context/top level context domain=]. + 1. Let |epochSwitchTimeDecisionMessageArray| be the concatenation of "epoch-switch-time-decision|", |lastEpochTime|, and |callerContext|'s [=topics caller context/top level context domain=]. 1. Let |epochSwitchTimeDecisionHmacOutput| be the output of the [=HMAC algorithm=], given input parameters: whichSha=SHA256, key=user agent's [=user agent/user topics state=]'s [=user topics state/hmac key=], and message_array=|epochSwitchTimeDecisionMessageArray|. 1. Let |epochSwitchTimeDecisionHash| be 64-bit truncation of |epochSwitchTimeDecisionHmacOutput|. 1. Let |epochSwitchTimeDelayIntroduction| be a [=duration=] of (|epochSwitchTimeDecisionHash| % 172800) seconds (i.e. 172800 is 2 days in seconds). @@ -382,7 +382,7 @@ spec: html; urlPrefix: https://www.rfc-editor.org/rfc/
- This roughly returns 3 recently calculated epochs, either counting back from the last epoch, or from the second to the last epoch. The decision depends on whether some fixed duration (between 0 and 2 days, sticky to a user agent & site) has passed since the last epoch was calculated. This essentially adds a per-site fixed delay to the epoch switch time, to make it harder to correlate the same user across sites via the time that topics are changed. The HMAC helps to compute the per-site delay on the fly, without needing to store extra data for each site. + This roughly returns 3 recently calculated epochs, either counting back from the last epoch, or from the second to the last epoch. The decision depends on whether some fixed duration (between 0 and 2 days, sticky to a user agent, site, and the last epoch) has passed since the last epoch was calculated. This essentially adds a per-site per-epoch fixed delay to the epoch switch time, to make it harder to correlate the same user across sites via the time that topics are changed, or via the time interval between two changes. The HMAC helps to compute the per-site per-epoch delay on the fly, without needing to store extra data for each site or epoch.

Get the number of distinct versions in epochs

From 26e23d97ca3c3ddba46a2487aa4ad6e4997fbdf6 Mon Sep 17 00:00:00 2001 From: Yao Xiao Date: Wed, 11 Sep 2024 17:38:51 -0400 Subject: [PATCH 2/3] fix link --- spec.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.bs b/spec.bs index 00c4ffb..d16f0c9 100644 --- a/spec.bs +++ b/spec.bs @@ -238,7 +238,7 @@ spec: html; urlPrefix: https://www.rfc-editor.org/rfc/ Given a [=list=] of [=topics history entries=] historyEntriesForUserTopics, the browser should provide an algorithm to derive top 5 topics, that are believed to be valuable for the Topics callers. The algorithm should return a [=list=] of 5 [=topic ids=].
- In Chrome versions M122 and later, topics are scored for ranking first by a binary priority level (see topics-utility-buckets-v1.md), and then by the frequency of page loads with that topic. + In Chrome versions M122 and later, topics are scored for ranking first by a binary priority level (see topics-utility-buckets-v1.md), and then by the frequency of page loads with that topic.
From ecd6c29351ca581ed2d2ca5499b37bb33e9fd0ef Mon Sep 17 00:00:00 2001 From: Yao Xiao Date: Wed, 11 Sep 2024 18:03:06 -0400 Subject: [PATCH 3/3] Update explainer --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b823d3c..712da35 100644 --- a/README.md +++ b/README.md @@ -88,7 +88,7 @@ The topics will be inferred by the browser. The browser will leverage a classifi * The 5% noise is introduced to ensure that each topic has a minimum fraction of members as well as to provide some amount of plausible deniability. * The reason that each site gets associated with only one of the user's topics for that epoch is to ensure that callers on different sites for the same user see different topics. This makes it harder to reidentify the user across sites. * e.g., site A might see topic ‘cats’ for the user, but site B might see topic ‘automobiles’. It’s difficult for the two to determine that they’re looking at the same user. - * The beginning of a week is per-user and per-site. That is, for the same user, site A may see the new week's topics introduced at a different time than site B. This is to make it harder to correlate the same user across sites via the time that they change topics. + * The beginning of a week is per-user, per-site, and per-epoch. That is, for the same user, site A may see the new week's topics introduced at a different time than site B, and for the same user and site, the duration of a topic may not be exactly one week. This is to make it harder to correlate the same user across sites via the time that they change topics, or via the time interval between two changes. * Not every API caller will receive a topic. Only callers that observed the user visit a site about the topic in question within the past three weeks can receive the topic. If the caller (specifically the site of the calling context) did not call the API in the past for that user on a site about that topic, then the topic will not be included in the array returned by the API. The exception to this filtering is the 5% random topic, that topic will not be filtered. * Note that observing a topic also includes observing the topic's entire ancestry tree. For instance, observing `/Arts & Entertainment/Humor/Live Comedy` also counts as having observed `/Arts & Entertainment/Humor/` and `/Arts & Entertainment`. * This is to prevent the direct dissemination of user information to more parties than the technology that the API is replacing (third-party cookies).