first eval part

jitsedesmet · Apr 29, 2024 · ad3da6e · ad3da6e
1 parent 686bf68
commit ad3da6e
Show file tree

Hide file tree

Showing 3 changed files with 130 additions and 8 deletions.
diff --git a/_papers/thesis-report/chapters/evaluation.typ b/_papers/thesis-report/chapters/evaluation.typ
@@ -1,21 +1,142 @@
+#import "../utils/review.typ": *
+#import "../utils/general.typ": *
+
 = Evaluation
-Evaluate the vocabulary.
+
+This chapter provides an extensive evaluation of @sgv introduced in @sec:storage-guidance-voc.
+To evaluate the vocabulary, we implemented a query engine with a minimal set of features from @sgv.
+After discussing the implementation, we shortly discuss the theoretical cost of our operations.
+We finish with an empirical evaluation of the query engine on an adapted benchmark.
 
 
 == Implementation
 
-What did we implement?
 
-What technologies?
+// What did we implement?
 
-Considerations?
+// What technologies?
 
+// Considerations?
 
 
-== Theoretical Evaluation
+To analyse the capabilities of @sgv, we implemented a query engine capable of parsing a pod's @sgv description and acting accordingly.
+The source code of the implementation can found
+#link("https://github.com/jitsedesmet/sgv-update-engine")[online].
+The query engine acts as a wrapper around the modular
+#link("https://comunica.dev/")[Comunica query engine].
+We chose to implement a wrapper around Comunica for convenience because it allows us to quickly get results without the need of understanding, or changing internal code.
+
+For this proof of concept implementation, we will only support essential parts of @sgv.
+We therefore provide an implementation of only the following concepts:
++ Canonical Collection
++ Group Strategy: only @uri templates
++ Resource Description: only @shex
++ Save Condition: always stored, prefer other, only stored when not redundant, and never stored
++ Update Condition: prefer static, move to best matched, and disallow
 
-A lot of the added cost can be reasoned. We will go through that reasoning as it might provide more insights than benchmarking for some parts.
+To parse and validate our @shex descriptions, we use the
+#link("https://www.npmjs.com/package/rdf-validate-shacl")[rdf-validate-shacl library].
+This library is known to be quite inefficient and could be replaced by the faster
+#link("https://www.npmjs.com/package/shacl-engine")[SHACL engine library].
+
+
+== Theoretical Evaluation
 
+In our theoretical evaluation, we will analyse a few metrics like: number of @http requests.
+
+
+=== Insert Operation
+
+In this section, we analyse the cost of a simple insert operation like:
+#text-example[
+```turtle
+prefix ns1: <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/>
+prefix xsd: <http://www.w3.org/2001/XMLSchema#>
+prefix card: <http://example.com/pod/profile/card#>
+prefix tag: <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/>
+PREFIX resource: <http://dbpedia.org/resource/>
+
+INSERT DATA {
+  <> a ns1:Post ;
+    ns1:browserUsed "Chrome" ;
+    ns1:content
+      "I want to eat an apple while scavenging for mushrooms in the forest." ;
+    ns1:creationDate "2024-05-08T23:23:56.830000+00:00"^^xsd:dateTime ;
+    ns1:id "416608218494388"^^xsd:long ;
+    ns1:hasCreator card:me ;
+    ns1:hasTag tag:Alanis_Morissette, tag:Austria ;
+    ns1:isLocatedIn resource:China ;
+    ns1:locationIP "1.83.28.23" .
+}
+```
+]
+
+In @sec:flow-create-rdf-resource we analysed the steps required for this operation.
+
+==== Fetch the Description
+
+The query engine should request the @sgv description.
+This accounts to one @http request, assuming the @api publishes it as a single @http resource.
+
+==== Loop the Resource Descriptions
+
+The next thing a query engine must do is checking what canonical collections want to save the resource.
+In the worst-case scenario, all collections could save the resource, but they only discover this at the last resource description of each collection.
+In such a case, all resource descriptions pointed to by canonical collections need to be checked.
+
+The cost of a single validation can be linear in the number of properties the description has.
+Since the focus of the resource is on a single named node, only that named node should be seen as a focus node in the validation.
+
+The computational load could be reduced when multiple resource descriptions share have overlapping descriptions.
+A shape could in that case be defined as a conjunction using `sh:and`.
+Take the example of images and personal images.
+A personal image could be described as:
+#text-example[
+```turtle
+ex:PictureShape
+  a sh:NodeShape .
+
+ex:WhatMakesPicturePersonalShape
+  a sh:NodeShape .
+ 
+ex:PersonalPictureShape
+	a sh:NodeShape ;
+	sh:and (
+		ex:pictureShape
+		ex:WhatMakesPicturePersonalShape
+	) .
+```
+]
+In this case, a query engine could cache the evaluation result of `ex:Picture`.
+Optimizations like this could likely be automated.
+
+==== Filter Collections on Save Condition
+
+The complexity of filtering the list of eligible collections could be significant.
+We do, however, expect that this list will be small.
+In case the `state required` condition is used, a whole @sparql query needs to be executed to check the state.
+We will thus disregard that case here.
+The worst-case performance is listed below:
+- Always Stored: Constant
+- Prefer Other: linear search in list of eligible collections.
+- Prefer Most Specific: linear scan trough eligible collections and distance function dependent cost for each collection. The cost is cacheable.
+- Only stored when not redundant: linear scan through collections in case no collection is clear-cut
+- Never: constant
+
+==== Compute Named Node
+
+For each collection that will save the resource, we now have to compute the named node.
+In the case of @uri templates with regexes, this cost negligible.
+
+==== Create Resources
+
+The Solid Specification requires updates to happen using N3Patch,
+this means that each created resource requires its own @http request.
+
+Interestingly, some implementations of a solid server, like the 
+#link("https://communitysolidserver.github.io/CommunitySolidServer/7.x/usage/example-requests/#patch-modifying-resources")[Community Solid Server]
+also accept SPARQL queries.
+Using a @sparql query, all resources could be created using a single @http request.
 
 
 == Empirical Evaluation

diff --git a/_papers/thesis-report/chapters/storage-guidance-vocab.typ b/_papers/thesis-report/chapters/storage-guidance-vocab.typ
@@ -2,7 +2,7 @@
 #import "../utils/general.typ": *
 #import "@preview/treet:0.1.1": tree-list
 
-= Storage Guidance Vocabulary
+= Storage Guidance Vocabulary <sec:storage-guidance-voc>
 
 #add[You could suggest addition], #delete[or deletion.]
 #MRT[Ruben makes margin note]

diff --git a/_papers/thesis-report/glossary.typ b/_papers/thesis-report/glossary.typ
@@ -22,5 +22,6 @@
   (key: "ldp", short: "LDP", long: "Linked Data Platform"),
   (key: "uri", short: "URI"),
   (key: "sparql", short: "SPARQL"),
-  (key: "ldes", short: "LDES", long: "Linked Data Event Streams")
+  (key: "ldes", short: "LDES", long: "Linked Data Event Streams"),
+  (key: "api", short: "API", long: "Application Programming Interface"),
 )