Skip to content

Commit

Permalink
talk about shards and replication
Browse files Browse the repository at this point in the history
  • Loading branch information
jitsedesmet committed Jun 2, 2024
1 parent aee9202 commit 0094b5a
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 5 deletions.
9 changes: 5 additions & 4 deletions _papers/thesis-report/additional/extended-abstract.typ
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,11 @@ Users are in control of their data store, how they interact with the datastore a
The effectiveness of reading data in a decentralized environment has been increased by abstracting data reads through a query abstraction layer, the query engine, by using query languages like GraphQL~@bib1:graphql and SPARQL~@bib1:sparql.
In this work, we will similarly research how we can abstract data updates by using a query abstraction layer.
The current (draft) Solid specification~@bib1:solid-spec describes each data store, or pod, as a document oriented interface where a user decides for each document who can access that document.
Our goal is thus to create a query engine that effectively decides what document a resource should be stored in. Easily eliminating the access-path data dependency.
We hypothesize that such a query engine has a 2x overhead in the number of HTTP requests and a 4x overhead in the execution time compared to a query engine that requires the user to configure the document explicitly. Such an overhead is acceptable since write speeds are, in contrast with read speeds, often not critical.

#IRT[Can you in a few words explain why this overhead is acceptable?]
Our goal is thus to create a query engine that effectively decides what document a resource should be stored in.
Easily eliminating the access-path data dependency.
We hypothesize that such a query engine has a 2x overhead in the number of HTTP requests and a 4x overhead in the execution time compared to a query engine that requires the user to configure the document explicitly.
This overhead is often acceptable because applications are typically created in such a way that they synchronize local changes in the background, without disturbing the user.
This acceptable delay of updates contrasts with reads because in the case of reading data, the user flow is often interrupted when information is transferred.


= Related Work
Expand Down
1 change: 1 addition & 0 deletions _papers/thesis-report/chapters/preface.typ
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ We hypothesize that the overhead such an intelligent client has, in comparison t
Concretely, we expect a maximum execution time overhead of four times, and maximum double the @http requests.
For applications that do not write too often, this is an acceptable overhead for the amount of complexity it takes away from developers.
Even more so, write speeds are, in contrast to read speeds, typically not critical, since users often don't need them for interactivity.
The reason being that applications are typically created in such a way that they synchronize local changes in the background, without disturbing the user.
// If they would, a local first approach would have been chosen.

== Outline
Expand Down
18 changes: 18 additions & 0 deletions _papers/thesis-report/chapters/solid.typ
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,24 @@ However, the existing specifications are not enough as solid faces numerous chal
These challenges span across multiple domains like interface design, query engine design, access control, usage control, etc.
To tackle these challenges, Solid creates some own specifications, but tries to keep them generic for different use cases.

== Positioning

Troughout this work we approach the collection of all Solid data stores (pods) as a permissioned decentralized graph database.
It's important to note that Solid differentiates itself from typical distributed data base systems in a variaty of ways.
A distributed database will both replicate and shard its data~@bib:distributed-database-fundamentals.
Replication means that the same data is stored on multiple machines and sharding means that one machine does not hold all data.
We can thus view the data in a distributed database as a collection of shards, these shards are replicated a configurable amound of times and stored across different machines.
The replication of data can happen in different configurations, each with their own considerations
(the interested reader can read #cite(<bib:distributed-database-fundamentals>, form: "prose"), specifically section 24.1.2).
One example consideration is the leader-follower configuration where each update is performed on the leader while reads are performed on both leades and followers.
The leader is responible of syncronizing the data updates to the followers.
Such a configuration chooses for availability before consistency on the @cap scale because it is possiible reads are outdated and thus inconsistent.
The @cap theorem states that when desiging a system, you can only choose two of the three properties {Consistency, Availability, Partation tolerance}.

Interestingly, Solid does not introduce any replication accross Solid pods.
From a theoretical point of view, this means that we can view a single Solid pod as a single shard of our database, and each shard being self governed.
As a result of not having data replication, the Solid specification does not position itself in the @cap space, choosing neither consistency nor availablility.


== Access Control

Expand Down
8 changes: 7 additions & 1 deletion _papers/thesis-report/items.bib
Original file line number Diff line number Diff line change
Expand Up @@ -536,4 +536,10 @@ @article{bib:links-can-break
pages={52--58},
year={1992},
publisher={MCB UP Ltd}
}
}

@article{bib:distributed-database-fundamentals,
title={Fundamentals of database systems seventh edition},
author={Elmasri, Ramez},
year={2021}
}

0 comments on commit 0094b5a

Please sign in to comment.