Skip to content

Commit

Permalink
first eval part
Browse files Browse the repository at this point in the history
  • Loading branch information
jitsedesmet committed Apr 29, 2024
1 parent 686bf68 commit ad3da6e
Show file tree
Hide file tree
Showing 3 changed files with 130 additions and 8 deletions.
133 changes: 127 additions & 6 deletions _papers/thesis-report/chapters/evaluation.typ
Original file line number Diff line number Diff line change
@@ -1,21 +1,142 @@
#import "../utils/review.typ": *
#import "../utils/general.typ": *

= Evaluation
Evaluate the vocabulary.

This chapter provides an extensive evaluation of @sgv introduced in @sec:storage-guidance-voc.
To evaluate the vocabulary, we implemented a query engine with a minimal set of features from @sgv.
After discussing the implementation, we shortly discuss the theoretical cost of our operations.
We finish with an empirical evaluation of the query engine on an adapted benchmark.


== Implementation

What did we implement?

What technologies?
// What did we implement?

Considerations?
// What technologies?

// Considerations?


== Theoretical Evaluation
To analyse the capabilities of @sgv, we implemented a query engine capable of parsing a pod's @sgv description and acting accordingly.
The source code of the implementation can found
#link("https://github.com/jitsedesmet/sgv-update-engine")[online].
The query engine acts as a wrapper around the modular
#link("https://comunica.dev/")[Comunica query engine].
We chose to implement a wrapper around Comunica for convenience because it allows us to quickly get results without the need of understanding, or changing internal code.

For this proof of concept implementation, we will only support essential parts of @sgv.
We therefore provide an implementation of only the following concepts:
+ Canonical Collection
+ Group Strategy: only @uri templates
+ Resource Description: only @shex
+ Save Condition: always stored, prefer other, only stored when not redundant, and never stored
+ Update Condition: prefer static, move to best matched, and disallow

A lot of the added cost can be reasoned. We will go through that reasoning as it might provide more insights than benchmarking for some parts.
To parse and validate our @shex descriptions, we use the
#link("https://www.npmjs.com/package/rdf-validate-shacl")[rdf-validate-shacl library].
This library is known to be quite inefficient and could be replaced by the faster
#link("https://www.npmjs.com/package/shacl-engine")[SHACL engine library].


== Theoretical Evaluation

In our theoretical evaluation, we will analyse a few metrics like: number of @http requests.


=== Insert Operation

In this section, we analyse the cost of a simple insert operation like:
#text-example[
```turtle
prefix ns1: <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix card: <http://example.com/pod/profile/card#>
prefix tag: <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/>
PREFIX resource: <http://dbpedia.org/resource/>
INSERT DATA {
<> a ns1:Post ;
ns1:browserUsed "Chrome" ;
ns1:content
"I want to eat an apple while scavenging for mushrooms in the forest." ;
ns1:creationDate "2024-05-08T23:23:56.830000+00:00"^^xsd:dateTime ;
ns1:id "416608218494388"^^xsd:long ;
ns1:hasCreator card:me ;
ns1:hasTag tag:Alanis_Morissette, tag:Austria ;
ns1:isLocatedIn resource:China ;
ns1:locationIP "1.83.28.23" .
}
```
]

In @sec:flow-create-rdf-resource we analysed the steps required for this operation.

==== Fetch the Description

The query engine should request the @sgv description.
This accounts to one @http request, assuming the @api publishes it as a single @http resource.

==== Loop the Resource Descriptions

The next thing a query engine must do is checking what canonical collections want to save the resource.
In the worst-case scenario, all collections could save the resource, but they only discover this at the last resource description of each collection.
In such a case, all resource descriptions pointed to by canonical collections need to be checked.

The cost of a single validation can be linear in the number of properties the description has.
Since the focus of the resource is on a single named node, only that named node should be seen as a focus node in the validation.

The computational load could be reduced when multiple resource descriptions share have overlapping descriptions.
A shape could in that case be defined as a conjunction using `sh:and`.
Take the example of images and personal images.
A personal image could be described as:
#text-example[
```turtle
ex:PictureShape
a sh:NodeShape .
ex:WhatMakesPicturePersonalShape
a sh:NodeShape .
ex:PersonalPictureShape
a sh:NodeShape ;
sh:and (
ex:pictureShape
ex:WhatMakesPicturePersonalShape
) .
```
]
In this case, a query engine could cache the evaluation result of `ex:Picture`.
Optimizations like this could likely be automated.

==== Filter Collections on Save Condition

The complexity of filtering the list of eligible collections could be significant.
We do, however, expect that this list will be small.
In case the `state required` condition is used, a whole @sparql query needs to be executed to check the state.
We will thus disregard that case here.
The worst-case performance is listed below:
- Always Stored: Constant
- Prefer Other: linear search in list of eligible collections.
- Prefer Most Specific: linear scan trough eligible collections and distance function dependent cost for each collection. The cost is cacheable.
- Only stored when not redundant: linear scan through collections in case no collection is clear-cut
- Never: constant

==== Compute Named Node

For each collection that will save the resource, we now have to compute the named node.
In the case of @uri templates with regexes, this cost negligible.

==== Create Resources

The Solid Specification requires updates to happen using N3Patch,
this means that each created resource requires its own @http request.

Interestingly, some implementations of a solid server, like the
#link("https://communitysolidserver.github.io/CommunitySolidServer/7.x/usage/example-requests/#patch-modifying-resources")[Community Solid Server]
also accept SPARQL queries.
Using a @sparql query, all resources could be created using a single @http request.


== Empirical Evaluation
Expand Down
2 changes: 1 addition & 1 deletion _papers/thesis-report/chapters/storage-guidance-vocab.typ
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#import "../utils/general.typ": *
#import "@preview/treet:0.1.1": tree-list

= Storage Guidance Vocabulary
= Storage Guidance Vocabulary <sec:storage-guidance-voc>

#add[You could suggest addition], #delete[or deletion.]
#MRT[Ruben makes margin note]
Expand Down
3 changes: 2 additions & 1 deletion _papers/thesis-report/glossary.typ
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@
(key: "ldp", short: "LDP", long: "Linked Data Platform"),
(key: "uri", short: "URI"),
(key: "sparql", short: "SPARQL"),
(key: "ldes", short: "LDES", long: "Linked Data Event Streams")
(key: "ldes", short: "LDES", long: "Linked Data Event Streams"),
(key: "api", short: "API", long: "Application Programming Interface"),
)

0 comments on commit ad3da6e

Please sign in to comment.