diff --git a/_papers/thesis-report/additional/extended-abstract.typ b/_papers/thesis-report/additional/extended-abstract.typ index ef7fdac..e26d023 100644 --- a/_papers/thesis-report/additional/extended-abstract.typ +++ b/_papers/thesis-report/additional/extended-abstract.typ @@ -88,18 +88,17 @@ This centralization of privacy causes social turbulence, since it centralizes th Luckily, legislative measures have been taken to protect society from this centralization~@bib1:gdpr @bib1:ccpa. As a response, centralization technologies are being developed, such as Solid~@bib1:solid, Bluesky~@bib1:bluesky, Mastodon~@bib1:mastodon and various blockchain-based initiatives~@bib1:nakamoto2008bitcoin. -The Solid initiative achieves centralization by creating a standard building on top of existing Web standards. +The Solid initiative achieves decentralization by creating a standard building on top of existing Web standards. This approach allows for interoperability and easier workflow adaptation by leveraging existing expertise. Nevertheless, the re-decentralization of the Web comes with various challenges ranging from efficient and effective read and write operations, to expressing and enforcing access and usage control policies. Reading data in this context has already gained some scientific attention @bib1:hartig2016walking @bib1:taelman-structure-assumptions, but effectively writing data remains rather unexplored. -Data decentralization initiatives such as Solid and Bluesky decentralize data by providing each user with a data store governed by the user. +Data decentralization initiatives such as Solid and Bluesky decentralize data by providing each user with a self-governed data store. Users are in control of their data store, how they interact with the datastore and who they share their data with. The effectiveness of reading data in a decentralized environment has been increased by abstracting data reads through a query abstraction layer, the query engine, by using query languages like GraphQL~@bib1:graphql and SPARQL~@bib1:sparql. In this work, we will similarly research how we can abstract data updates by using a query abstraction layer. The current (draft) Solid specification~@bib1:solid-spec describes each data store, or pod, as a document oriented interface where a user decides for each document who can access that document. -Our goal is thus to create a query engine that effectively decides what document a resource should be stored in. -Easily eliminating the access-path data dependency. +Our goal is thus to create a query engine that effectively decides what document a resource should be stored in, liminating the access-path data dependency. We hypothesize that such a query engine has a 2x overhead in the number of HTTP requests and a 4x overhead in the execution time compared to a query engine that requires the user to configure the document explicitly. This overhead is often acceptable because applications are typically created in such a way that they synchronize local changes in the background, without disturbing the user. This acceptable delay of updates contrasts with reads because in the case of reading data, the user flow is often interrupted when information is transferred. @@ -107,8 +106,6 @@ This acceptable delay of updates contrasts with reads because in the case of rea = Related Work -#IRT[I'm missing related work around updates in distributed systems here, also CRDTs and so on. And I don't remember now, but do you mention these in the rest of your thesis?] - // Solid uses RDF & LDP The Solid specification~@bib1:solid-spec builds on top of existing Semantic Web technologies such as RDF (Resource Description Framework)~@bib1:rdf and LDP (Linked Data Platform)~@bib1:ldp. LDP is a set of rules that is used to create a document oriented interface acceptable through HTTP. @@ -116,6 +113,25 @@ Such an interface essentially exposes a file system over HTTP, it creates direct that group together data documents and directories. Each of the exposed HTTP resources has their own access control policy declared through either WAC~@bib1:wac or ACP~@bib1:acp. +== Theoretical positioning of Solid + +The collection of all Solid pods can be interpreted as one big permissioned decentralized graph database with some interesting properties. +A typical distributed database will both replicate and shard its data @bib1:distributed-database-fundamentals. +Sharding data means that the collection of all data is divided into smaller shards, and each machine manages one or more shards. +Sharding allows us to scale our data horizontally. +Replication, on the other hand, makes sure that by replicating each shard on multiple machines, the system is partition tolerant~@bib1:cap. +Different approaches exist to configure the shards and replications. +Often times, each shard will have one leader replications, and the others are followers. +Reads than happen to both leader and followers, while writes only happen to the leader. +The leader is responsible for synchronizing all changes to the followers. +Such a configuration chooses reads to have eventual consistency~@bib1:distributed-database-fundamentals~@bib1:base, +positioning itself on the CAP scale~@bib1:cap @bib1:continous-cap by choosing Availability and Partition Tolerance. + +The Solid specification builds on top of HTTP and therefore, links can break~@bib1:links-can-break. +This essentially means that there is no partition tolerance, when a pod is disconnected, the data on that pod becomes unavailable. +Solid thus only has sharding and no replication from a theoretical perspective. +This is an interesting design choice because that means that Solid is jet to position itself on the CAP scale. + == Concise Bounded Description In this work, we will try to store RDF resources, defined as the CBD (Concise Bounded Description)~@bib1:concise-bounded-description of a Named Node. The CBD of a resource is defined as the collection of triples that can be accessed by recursively following objects, without following named nodes. @@ -421,31 +437,31 @@ Configuring an access policy in a certain document can thus be translated to wha Extracting policies based on the data can be useful when derived resources come into play. For example, it could be inferred when you have access to some resource in a canonical collection, that you should also have access to that resource in a derived collection given no data enrichment happened when the derived resource was created. -== Update Behaviour - -In this work we created a client that autonomously created an RDF resource without requiring to specify a URI. -To achieve this, we used a query engine that abstracts complex operations. -A query engine, just like any update API, has the power to choose where to position themselves within the CAP (Consistency, Availability, Partition tolerance) space~@bib1:cap. CAP essentially says you can only have two of three properties of {C, A, P} with the choice of a distributed system either being the ACID~@bib1:acid or BASE~@bib1:base properties. - -When choosing the BASE properties, a user chooses to drop consistency. -One way of doing so is by creating CRDTs (Conflict-free Replicated Data Types)~@bib1:crdt. -Essentially, when multiple people are using the same resource, they will all have their own local copy of the resource. -When the resource is edited, a CRDT will edit the local copy and synchronize the state later. -This means that the user does not always have the latest state of a resource, thus sacrificing consistency. -When synchronizing the resource, however, the synchronization should not just undo changes made by others. -Instead, both changes should be considered when updating the canonical resource. -A query engine that implements a CRDTs helps developers to create faster software. - -Another approach is to choose the ACID properties. -These properties are widely used in the form of relational database transactions. -They are not only expected by developers, but many applications are unable to operate without the consistency guarantees ACID brings. -We therefore believe that we should examine the possibility of ACID transaction within the decentralized data storage research. -This does not mean that we completely drop the availability property, as the inventors of the CAP theorem later describe~@bib1:continous-cap. -Furthermore, we believe that within the CAP space, Web technologies take on an interesting position. -Namely, the Web intentionally does not break when links do~@bib1:links-can-break. -in the context of distributed data spaces, this means that when a data store is unavailable, so is the data managed by that store. -This is in sharp contrast to a distributed database that duplicates data across many nodes so that all data remains accessible when a node goes offline. -Related to the CAP theorem, that means that data spaces do not have strong Partition Tolerance requirements, allowing us to devote more attention to consistency and availability. +// == Update Behaviour +// +// In this work we created a client that autonomously created an RDF resource without requiring to specify a URI. +// To achieve this, we used a query engine that abstracts complex operations. +// A query engine, just like any update API, has the power to choose where to position themselves within the CAP (Consistency, Availability, Partition tolerance) space~@bib1:cap. CAP essentially says you can only have two of three properties of {C, A, P} with the choice of a distributed system either being the ACID~@bib1:acid or BASE~@bib1:base properties. +// +// When choosing the BASE properties, a user chooses to drop consistency. +// One way of doing so is by creating CRDTs (Conflict-free Replicated Data Types)~@bib1:crdt. +// Essentially, when multiple people are using the same resource, they will all have their own local copy of the resource. +// When the resource is edited, a CRDT will edit the local copy and synchronize the state later. +// This means that the user does not always have the latest state of a resource, thus sacrificing consistency. +// When synchronizing the resource, however, the synchronization should not just undo changes made by others. +// Instead, both changes should be considered when updating the canonical resource. +// A query engine that implements a CRDTs helps developers to create faster software. +// +// Another approach is to choose the ACID properties. +// These properties are widely used in the form of relational database transactions. +// They are not only expected by developers, but many applications are unable to operate without the consistency guarantees ACID brings. +// We therefore believe that we should examine the possibility of ACID transaction within the decentralized data storage research. +// This does not mean that we completely drop the availability property, as the inventors of the CAP theorem later describe~@bib1:continous-cap. +// Furthermore, we believe that within the CAP space, Web technologies take on an interesting position. +// Namely, the Web intentionally does not break when links do~@bib1:links-can-break. +// in the context of distributed data spaces, this means that when a data store is unavailable, so is the data managed by that store. +// This is in sharp contrast to a distributed database that duplicates data across many nodes so that all data remains accessible when a node goes offline. +// Related to the CAP theorem, that means that data spaces do not have strong Partition Tolerance requirements, allowing us to devote more attention to consistency and availability. = Conclusion // Small resume @@ -474,9 +490,7 @@ Additionally, since one server interface is used by many decentralized clients, = References -#[ - #include "../utils/EA-bib.typ" -] +#[#include "../utils/EA-bib.typ"] // #bibliography(title: none, "../items.bib") diff --git a/_papers/thesis-report/utils/EA-bib.typ b/_papers/thesis-report/utils/EA-bib.typ index eca86d5..f58db67 100644 --- a/_papers/thesis-report/utils/EA-bib.typ +++ b/_papers/thesis-report/utils/EA-bib.typ @@ -36,37 +36,35 @@ [T. Berners-Lee, H. Story, and S. Capadisli, “Web Access Control,” May 2024.], bib-entry("15", "bib1:acp"), [M. Bosquet, “Access Control Policy (ACP),” May 2022.], - bib-entry("16", "bib1:concise-bounded-description"), + bib-entry("16", "bib1:distributed-database-fundamentals"), + [R. Elmasri, “Fundamentals of database systems seventh edition,” 2021.], + bib-entry("17", "bib1:cap"), + [A. Fox and E. A. Brewer, “Harvest, yield, and scalable tolerant systems,” in Proceedings of the Seventh Workshop on Hot Topics in Operating Systems, 1999, pp. 174–178.], + bib-entry("18", "bib1:base"), + [D. Pritchett, “BASE: An ACID Alternative: In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.,” Queue, vol. 6, no. 3, pp. 48–55, 2008.], + bib-entry("19", "bib1:continous-cap"), + [E. Brewer, “CAP twelve years later: How the "rules" have changed,” Computer, vol. 45, no. 2, pp. 23–29, Feb. 2012, doi: 10.1109/MC.2012.37.], + bib-entry("20", "bib1:links-can-break"), + [T. Berners-Lee, R. Cailliau, J.-F. Groff, and B. Pollermann, “World-Wide Web: the information universe,” Internet Research, vol. 2, no. 1, pp. 52–58, 1992.], + bib-entry("21", "bib1:concise-bounded-description"), [P. Stickler, “CBD - Concise Bounded Description,” Jun. 2005.], - bib-entry("17", "bib1:turtle"), + bib-entry("22", "bib1:turtle"), [G. Carothers and E. Prud'hommeaux, “RDF 1.1 Turtle,” Feb. 2014.], - bib-entry("18", "bib1:shex"), + bib-entry("23", "bib1:shex"), [T. Baker and E. Prud'hommeaux, “Shape Expressions (ShEx) 2.1 Primer,” Oct. 2019.], - bib-entry("19", "bib1:shacl"), + bib-entry("24", "bib1:shacl"), [H. Knublauch and D. Kontokostas, “Shapes Constraint Language (SHACL),” Jul. 201], - bib-entry("20", "bib1:type-index"), + bib-entry("25", "bib1:type-index"), [T. Turdean, J. Zucker, V. Balseiro, S. Capadisli, and T. Berners-Lee, “Type Indexes,” Jun. 2022.], - bib-entry("21", "bib1:shape-tree"), + bib-entry("26", "bib1:shape-tree"), [E. Prud'hommeaux and J. Bingham, “Shape Trees Specification,” Dec. 2021], - bib-entry("22", "bib1:ldbc"), + bib-entry("27", "bib1:ldbc"), [O. Erling et al., “The LDBC social network benchmark: Interactive workload,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015, pp. 619–630.], - bib-entry("23", "bib1:subweb"), + bib-entry("28", "bib1:subweb"), [B. Bogaerts, B. Ketsman, Y. Zeboudj, H. Aamer, R. Taelman, and R. Verborgh, “Link Traversal with Distributed Subweb Specifications,” in Proceedings of the 5th International Joint Conference on Rules and Reasoning, S. Moschoyiannis, R. Peñaloza, J. Vanthienen, A. Soylu, and D. Roman, Eds., in Lecture Notes in Computer Science, vol. 12851. Springer, Sep. 2021, pp. 62–79. doi: 10.1007/978-3-030-91167-6_5.], - bib-entry("24", "bib1:whats-in-pod"), + bib-entry("29", "bib1:whats-in-pod"), [R. Dedecker, W. Slabbinck, J. Wright, P. Hochstenbach, P. Colpaert, and R. Verborgh, “What’s in a Pod? A knowledge graph interpretation for the Solid ecosystem,” in 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs (QuWeDa) at ISWC 2022, 2022, pp. 81–96.], - bib-entry("25", "bib1:vanherwergenderived"), + bib-entry("30", "bib1:vanherwergenderived"), [J. Van Herwegen and R. Verborgh, “Granular Access to Policy-Governed Linked Data via Partial Server-Side Query.”], - bib-entry("26", "bib1:cap"), - [A. Fox and E. A. Brewer, “Harvest, yield, and scalable tolerant systems,” in Proceedings of the Seventh Workshop on Hot Topics in Operating Systems, 1999, pp. 174–178.], - bib-entry("27", "bib1:acid"), - [J. Gray and others, “The transaction concept: Virtues and limitations,” in VLDB, 1981, pp. 144–154.], - bib-entry("28", "bib1:base"), - [D. Pritchett, “BASE: An ACID Alternative: In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.,” Queue, vol. 6, no. 3, pp. 48–55, 2008.], - bib-entry("29", "bib1:crdt"), - [M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “Conflict-free replicated data types,” in Stabilization, Safety, and Security of Distributed Systems: 13th International Symposium, SSS 2011, Grenoble, France, October 10-12, 2011. Proceedings 13, 2011, pp. 386–400.], - bib-entry("30", "bib1:continous-cap"), - [E. Brewer, “CAP twelve years later: How the "rules" have changed,” Computer, vol. 45, no. 2, pp. 23–29, Feb. 2012, doi: 10.1109/MC.2012.37.], - bib-entry("31", "bib1:links-can-break"), - [T. Berners-Lee, R. Cailliau, J.-F. Groff, and B. Pollermann, “World-Wide Web: the information universe,” Internet Research, vol. 2, no. 1, pp. 52–58, 1992.], ) ) \ No newline at end of file