-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
143 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# Server Side Filtering | ||
|
||
* Status: draft | ||
* Deciders: mleplatre, acottner, <> | ||
* Date: Jan 3, 2024 | ||
|
||
## Context and Problem Statement | ||
|
||
In some situations, clients only need a subset of the collection records. | ||
|
||
And since we compute the signature on the server side for the whole dataset, clients are obliged to also download the whole dataset in order to verify the data integrity. | ||
|
||
In this document, we explore possible solutions to overcome this limitation. | ||
|
||
## Decision Drivers | ||
|
||
In order to choose our solution we considered the following criteria: | ||
|
||
- **Complexity**: Low → High: how complex is the solution | ||
- **User experience**: Bad → Good: how adapted to users is the solution | ||
- **Cost of implementation**: Low → High: how much efforts does it represent | ||
- **Cost of operation**: Low → High: how much does the solution cost to run | ||
|
||
|
||
## Considered Options | ||
|
||
1. [Option 0 - Do nothing](#option-0---do-nothing) | ||
1. [Option 1 - Split into several collections](#) | ||
1. [Option 2 - Read-only mirrors](#) | ||
1. [Option 3 - Implement notion of datasets](#) | ||
1. [Option 4 - Implement dynamic signatures](#) | ||
|
||
## Decision Outcome | ||
|
||
Chosen option: Option 2 because we already have all the pieces in place, it is very low tech and low cost, while still providing a good user experiences for editors and publication of data. | ||
|
||
## Pros and Cons of the Options | ||
|
||
### Option 0 - Do nothing | ||
|
||
No change is made. All clients download everything and filter locally using JEXL targeting. | ||
|
||
- **Complexity**: N/A | ||
- **User experience**: Good, users edit a single collection. | ||
- **Cost of implementation**: N/A | ||
- **Cost of operation**: High. All clients download everything, resulting in higher bandwidth costs. | ||
|
||
|
||
### Option 1 - Split into several collections | ||
|
||
The single collections is split into several ones, and the client is able to determine which collection it should pull from (eg. one per region). | ||
|
||
If a record is required in different subsets, it is duplicated in each collection. | ||
|
||
- **Complexity**: N/A | ||
- **User experience**: Bad, even if a script could automate the publication of records into several collections, editors don't have a single source of truth for the collection, leading to duplicated actions and data. | ||
- **Cost of implementation**: Low. Creating collections is cheap. | ||
- **Cost of operation**: Low. Clients download only the subset of data, saving bandwidth. | ||
|
||
|
||
### Option 2 - Read-only mirrors | ||
|
||
With this solution, the main collection is maintained, and several "*side collections*" are created. As with *Option 1*, the clients are able to pick which collection to pull from. | ||
|
||
Unlike *Option 1*, the additional collections are read-only. Editors publish data in the main collection, and a scheduled job will copy the records at regular intervals to the side collections using filters. | ||
|
||
The [`backport_records` cronjob]() copies records from collection to another and signs it. It can take querystring filters in order to only copy of subset. | ||
|
||
For example, if records have a `regions` field: | ||
|
||
``` | ||
{ | ||
"id": "shop-A", | ||
"regions": ["europe", "asia", "north-america"], | ||
}, | ||
{ | ||
"id": "shop-B", | ||
"regions": ["europe", "africa"], | ||
} | ||
``` | ||
|
||
Then different collections can be populated using this field in [a querystring filter](https://docs.kinto-storage.org/en/latest/api/1.x/filtering.html#comparison): | ||
|
||
* `shops-africa`: `?contains_regions=["africa"]` | ||
* `shops-europe`: `?contains_regions=["europe"]` | ||
* `shops-asia`: `?contains_regions=["asia"]` | ||
|
||
- **Complexity**: Low. It is based on existing pieces of the current architecture. | ||
- **User experience**: Good, because editors only manipulate the main collection to assign datasets and to publish data. | ||
- **Cost of implementation**: Low. Creating collections is cheap and onboarding new instances of the `backport_records` job also. | ||
- **Cost of operation**: Low. Clients download only the subset of data, saving bandwidth, and running cronjobs is cheap. | ||
|
||
|
||
### Option 3 - Implement notion of datasets | ||
|
||
With this solution, we compute several signatures for a single collection. | ||
|
||
On the server side, we could introduce the notion of *datasets* for a collection. Datasets could be declared in the server configuration: | ||
|
||
``` | ||
# config.ini | ||
kinto.signer.datasets.main.shops.shops-africa = ?contains_regions=["africa"] | ||
kinto.signer.datasets.main.shops.shops-europe = ?contains_regions=["europe"] | ||
kinto.signer.datasets.main.shops.shops-asia = ?contains_regions=["asia"] | ||
``` | ||
|
||
When a change is published in the collection, instead of computing a single signature for the whole collection, we compute a signature for each dataset and put them in the collection metadata: | ||
|
||
``` | ||
{ | ||
"metadata": { | ||
"signatures": { | ||
"shops-africa": { | ||
"filter": "?contains_regions=["africa"]", | ||
"signature": "P2OAvlvj1uhIAafIVgtpuAo3lF4pZWERc...C4rC7c9-EC4cl77R35Qo3hRYg2lKU", | ||
}, | ||
"shops-europe": { | ||
"filter": "?contains_regions=["europe"]", | ||
"signature": "MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEab4HfqYGRLW...", | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The clients would pull the subset of data, and use the specific signature to verify the integrity. | ||
|
||
- **Complexity**: Medium-High. We introduce some coupling between server configuration and client behaviour. | ||
- **User experience**: Good, users edit a single collection. | ||
- **Cost of implementation**: Medium-High. Server side code is relatively straightforward, but the different clients implementations have to be modified to support this new type of metadata. | ||
- **Cost of operation**: Low. Clients download only the subset of data, saving bandwidth, and signing datasets on each publication is relatively cheap. | ||
|
||
|
||
### Option 4 - Implement dynamic signatures | ||
|
||
With this solution we sign every server response dynamically. | ||
|
||
- **Complexity**: Medium. In terms of architecture, this would introduce any real complexity/ | ||
- **User experience**: Good, users edit a single collection. | ||
- **Cost of implementation**: Medium-High, this would represent a big change of approach and thus code. | ||
- **Cost of operation**: High. Autograph would have to scale to the traffic received on the origin servers of Remote Settings. |