Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add section about 'unstar' mapping #115

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

add section about 'unstar' mapping #115

wants to merge 12 commits into from

Conversation

pchampin
Copy link
Contributor

@pchampin pchampin commented Nov 13, 2024

as per w3c/rdf-star-wg#129

Preview with the examples working
(as opposed to the automatic preview below)


Preview | Diff

@afs
Copy link
Contributor

afs commented Nov 13, 2024

Is this proposed "as well as" the graph-to-graph algorithm or "instead of"?

w3c/rdf-star-wg#114 (comment)

@niklasl
Copy link
Contributor

niklasl commented Nov 13, 2024

This approach requires graphs containing triple terms to be represented as datasets. That excludes cases where you need to put "unstarred" RDF 1.2 graphs into an RDF 1.1-based quad store and manage them within specific named graphs. Implementations supporting the default graph union mechanism would also treat the "triple term graphs" as asserted in that union graph.

@niklasl
Copy link
Contributor

niklasl commented Nov 13, 2024

It has been mentioned (or opined) that the "star" name would eventually go away (it would be just RDF 1.2 with triple terms). If so, perhaps "unstar" is an unfortunate name for future reference?

@rat10
Copy link

rat10 commented Nov 13, 2024

There seems to be an issue with the examples section: all examples say "Cannot GET /uploads/dcqFS6/spec/ex-unstar-output.trig".

@gkellogg
Copy link
Member

There seems to be an issue with the examples section: all examples say "Cannot GET /uploads/dcqFS6/spec/ex-unstar-output.trig".

It's the general PR-preview issue of not being able to retrieve neighboring resources. They are fleshed out if you look at the GitHack version.

@pchampin
Copy link
Contributor Author

Is this proposed "as well as" the graph-to-graph algorithm or "instead of"?

I personally don't think that we should have multiple such mappings, and I am more and more convinced that the graph-to-graph approach makes more sense.

The reasons I stuck to my initial graph-to-dataset approach in this PR are that

  • I am still not clear about the details of the graph-to-graph approach, and
  • I wanted to write down the design goals of the mapping, so that we can discuss them (and inform the responses to the previous point).

@rat10
Copy link

rat10 commented Nov 14, 2024

What is the "graph-to-graph" algorithm? A mapping based on the RDF standard reification vocabulary?

@niklasl
Copy link
Contributor

niklasl commented Nov 14, 2024

What is the "graph-to-graph" algorithm? A mapping based on the RDF standard reification vocabulary?

Or something isomorphic to it but using dedicated terms. There's an example of that in this recent wiki page (with links to w3c/rdf-semantics#49 and w3c/rdf-star-wg#114.).

@afs
Copy link
Contributor

afs commented Nov 22, 2024

Semantic Task Force 2024-11-22

We are looking at the "graph" flavor of unstar.

@afs
Copy link
Contributor

afs commented Nov 22, 2024

Design goals: (content from the PR)

  • Information preserving
    It must be possible to reconstruct the input dataset from the output dataset.
    Note that, on the other hand, the algorithm is not designed to be semantics preserving:
    the graphs in the produced dataset are not semantically equivalent to their corresponding graph in the input dataset.
  • Idempotent
    Transforming a dataset that is already complying with RDF Classic (i.e. containing no triple term) must result in the same dataset.
  • Universal
    It should be possible to transform any RDF Full dataset using this method.

- unstarring a graph now produce a graph (not a dataset)
- it uses the reification vocabulary
  (with a distinctinve type rdf:UnstarredTripleTerm)
@pchampin
Copy link
Contributor Author

I just updated the PR; the new algorithm is "graph-to-graph", repurposing the reification vocabulary.
Note that I coined rdf:UnstarredTripleTerm to type the generated blank nodes.

Note that I deliberately chose a very specific name for this, to distinguish it from the type we will probably introduce as the class of all triple terms, for example, to describe the range of rdf:reifies (say, rdf:Triple). Indeed, I believe that there will be valid use-cases to use that type (rdf:Triple) in Fuill RDF graphs, while rdf:UnstarredTripleTerm should really be considered as "reserved" and not to be used outside of the "unstar" algorithm.

@afs
Copy link
Contributor

afs commented Nov 29, 2024

This is a suggestion related to presentation only.

RDF-concepts defines RDF and is also a readable document.

The algorithms of unstar/restar are good for defining the translation but algorithms do not communicate the broad intent so well.

Maybe: have all the written description, discussion and examples, then have the algorithms.

  • Put the overview "The general principle" at the beginning of section 8
  • Pull the examples into section 8.
  • Either push all algorithms to later sections within section 8 or put the algorithms as normative appendixes.

@rat10
Copy link

rat10 commented Dec 5, 2024

There seems to be an issue with the examples section: all examples say "Cannot GET /uploads/dcqFS6/spec/ex-unstar-output.trig".

It's the general PR-preview issue of not being able to retrieve neighboring resources. They are fleshed out if you look at the GitHack version.

@gkellogg How did you create that link? I'm trying to read the new PR but again am unable to read the included examples.

@gkellogg
Copy link
Member

gkellogg commented Dec 5, 2024

There seems to be an issue with the examples section: all examples say "Cannot GET /uploads/dcqFS6/spec/ex-unstar-output.trig".

It's the general PR-preview issue of not being able to retrieve neighboring resources. They are fleshed out if you look at the GitHack version.

@gkellogg How did you create that link? I'm trying to read the new PR but again am unable to read the included examples.

Select spec/index.html in the branch you want to see and enter it in http://raw.githack.com/. It gives you a link to the rendered version.

@niklasl
Copy link
Contributor

niklasl commented Dec 5, 2024

I think this approach is good.

The use of a dedicated type for unstarred triple terms seems prudent. If used, an rdf:UnstarredTripleTerm rdfs:subClassOf rdf:TripleTerm axiom should be defined (possibly where w3c/rdf-semantics#49 is defined, if it will be). This since the unstarred form is useful as input to e.g. OWL reasoners without full RDF 1.2 support, but users of that should only rely on the rdf:TripleTerm type, not this special subclass. This is because when such tooling is updated to RDF 1.2, the type of triple terms will just be rdf:TripleTerm, and any OWL-based axioms should still work.

Whether the constituent triple term predicates should be reused from the reification vocabulary or not has been debated some (see e.g. w3c/rdf-semantics#49 (comment)). It depends on whether or not it makes sense to make rdf:TripleTerm a subclass of rdf:Statement.

I think the name rdf:TripleTerm has been used most recently, but it has perhaps not been finalized (I am somewhat in favor of rdf:Triple if there is room for debate).

But these details, about naming and which constituent triple term predicates to use, can probably be dealt with separately, to avoid blocking this PR.

The question about whether to use name "unstar" at all remains (as "RDF-star" is not mentioned as such in RDF 1.2; only in reference to the RDF-star WG).

@rat10
Copy link

rat10 commented Dec 5, 2024

I just updated the PR; the new algorithm is "graph-to-graph", repurposing the reification vocabulary. Note that I coined rdf:UnstarredTripleTerm to type the generated blank nodes.

Note that I deliberately chose a very specific name for this, to distinguish it from the type we will probably introduce as the class of all triple terms, for example, to describe the range of rdf:reifies (say, rdf:Triple). Indeed, I believe that there will be valid use-cases to use that type (rdf:Triple) in Fuill RDF graphs, while rdf:UnstarredTripleTerm should really be considered as "reserved" and not to be used outside of the "unstar" algorithm.

To me it seems like the discussions in the Semantics TF and in Github issues, e.g. rdf-semantics issue #49 and rdf-star-wg issue #130, moved away from re-using the RDF reification vocabulary. The reason is a bit intricate: the RDF standard reification describes an occurrence/instance/reification of a triple. The triple term describes a triple and only the reification, indicated by rdf:reifies, creates a reference to the instance/occurrence/reification. That means that in the following example _:r and _:s are semantically equivalent, but the triple term and the reification quad are not.

_:r rdf:reifies <<( :s :p :o )>> .
_:s a rdf:Statement ;
    rdf:subject :s ;
    rdf:predicate :p ;
    rdf:object :o .

Given this difference in meaning I think it's more prudent to not reuse the properties from the reification vocabulary but to mint new ones, like e.g. rdf:tripleTermSubject, etc.

W.r.t. other wordings:

  • how about rdf:unTripleTerm and rdf:reTripleTerm?
  • I favor rdf:TripleTerm over rdf:Triple to refer to a triple term, since the latter could be easily misunderstood to refer to regular, asserted RDF triples.

@pchampin
Copy link
Contributor Author

pchampin commented Dec 6, 2024

Regarding the name of the algorithm,

I guess we could go for "classicize" (as it converts to RDF "classic").
"flatten" would also seem appropriate, but could create confusion with the algorithm of the same name in JSON-LD? The context (no pun intended) is quite different so I'm not convinced this would be a real issue.

@pchampin
Copy link
Contributor Author

pchampin commented Dec 6, 2024

Regarding the vocabulary,

I think that duplicating the properties rdf:subject, rdf:predicate, rdf:object could also create a lot of confusion, so I would refrain from doing that unless repurposing them really breaks something badly. But since rdf:Statement is so loosely defined, I don't think that would be the case. I am happy to consider that, in retrospect, rdf:Statement can include both the "platonic triples" denoted by triple terms and the "occurrences" denoted by reifiers.

@rat10
Copy link

rat10 commented Dec 6, 2024

Regarding the vocabulary,

I think that duplicating the properties rdf:subject, rdf:predicate, rdf:object could also create a lot of confusion, so I would refrain from doing that unless repurposing them really breaks something badly. But since rdf:Statement is so loosely defined, I don't think that would be the case. I am happy to consider that, in retrospect, rdf:Statement can include both the "platonic triples" denoted by triple terms and the "occurrences" denoted by reifiers.

The WG seems to tend towards allowing the usage of triple terms in a more general way than just as a source of reifiers. In that context it is probably prudent to differentiate the two concepts, and not mix and mingle them. Also, I don't see how rdf:Statement is loosely defined. The naming may be a bit vague, but the definition IMO is not.

@rat10
Copy link

rat10 commented Dec 10, 2024

Regarding the vocabulary,
I think that duplicating the properties rdf:subject, rdf:predicate, rdf:object could also create a lot of confusion, so I would refrain from doing that unless repurposing them really breaks something badly. But since rdf:Statement is so loosely defined, I don't think that would be the case. I am happy to consider that, in retrospect, rdf:Statement can include both the "platonic triples" denoted by triple terms and the "occurrences" denoted by reifiers.

The WG seems to tend towards allowing the usage of triple terms in a more general way than just as a source of reifiers. In that context it is probably prudent to differentiate the two concepts, and not mix and mingle them. Also, I don't see how rdf:Statement is loosely defined. The naming may be a bit vague, but the definition IMO is not.

On second (or rather n-th) thought I'd like to add something.

Following recent discussions in the Semantics TF (as captured here and discussed there) we might settle for calling rdf:Statement the act of stating a triple (without of course saying if that stating actually happened, and where), and calling rdf:Proposition the abstract triple as described by an RDF-star triple term. Both are composed of subject, predicate and object, so ...I again tend to agree with you. .

However, doesn't this jeopardize backwards compatability? So far it's possible to infer that an entity is of type rdf:Statement if it's the subject of an rdf:subject|predicate|object statement. From anecdotal evidence I gather that it's common practice to omit the type declaration and reduce triple count by one when using the RDF standard reification vocabulary. That practice becomes unsound as soon as unstar-ed triple terms enter the mix.

To work around this problem, we could stress that any application of the unstar mapping should refrain from applying the same optimization. Also, an unassuming RDF 1.1 environment would not be led into a completely wrong direction if it assumed that the immediate subject of an unstar operation - _:gen1 in your proposed Example 3 - represents an RDF standard reification. It doesn't since it would miss multi-part reifications, but maybe it's close enough.

On the other hand, backwards compatability is what the unstar mapping is all about, so why jeopardize it?

@pchampin
Copy link
Contributor Author

To work around this problem, we could stress that any application of the unstar mapping should refrain from applying the same optimization.

Applying the unstar mapping means following the algorithm. If you don't follow it exactly, in particular if you don't add the triple (b, rdf:type, rdf:UnstarredTripleTerm), then it is not the unstar mapping anymore, and you may lose the properties listed at the top of Section 8 ("information preserving", "idempotent", "universal") can not be guaranteed...

However, doesn't this jeopardize backwards compatability?

I do not suggest to change the domain of rdf:subject and co.... That would indeed jeopardize backwards compatibility.

So yes, it would mean that any bnode generated by the unstar mapping could be inferred to be of type rdf:Statement, but I don't think this is a problem. The important thing is that it is also of type rdf:UnstarredTripleTerm, which is what should matter in that situation.

Regarding the notion of rdf:Proposition discussed by the Semantics TF: I don't think that the rdf:UnstarTripleTerm class (used to encode triple terms) needs to be related to rdf:Proposition (used to define the semantics of triple terms). But even if I did: according to WordReference, several definitions of the term (and all of those related to logic or mathematics) define it as "a statement ...". This makes "proposition" a subclass of "statement", which is fine by me, and allows us to use rdf:subject and friends to describe instances of rdf:Proposition.

@rat10
Copy link

rat10 commented Dec 10, 2024

To work around this problem, we could stress that any application of the unstar mapping should refrain from applying the same optimization.

Applying the unstar mapping means following the algorithm. If you don't follow it exactly, in particular if you don't add the triple (b, rdf:type, rdf:UnstarredTripleTerm), then it is not the unstar mapping anymore, and you may lose the properties listed at the top of Section 8 ("information preserving", "idempotent", "universal") can not be guaranteed...

Fair enough.

However, doesn't this jeopardize backwards compatability?

I do not suggest to change the domain of rdf:subject and co.... That would indeed jeopardize backwards compatibility.

So yes, it would mean that any bnode generated by the unstar mapping could be inferred to be of type rdf:Statement, but I don't think this is a problem. The important thing is that it is also of type rdf:UnstarredTripleTerm, which is what should matter in that situation.

I discussed the "unassuming RDF 1.1 environment", and that is what matters to backwards compatability. In such an environment, rdf:subject etc would be assumed to refer to an RDF standard reification - which is exactly not what we want. Such an environment would probably not even be aware of the existence of the type rdf:UnstarredTripleTerm, let alone check for it.

Regarding the notion of rdf:Proposition discussed by the Semantics TF: I don't think that the rdf:UnstarTripleTerm class (used to encode triple terms) needs to be related to rdf:Proposition (used to define the semantics of triple terms). But even if I did: according to WordReference, several definitions of the term (and all of those related to logic or mathematics) define it as "a statement ...". This makes "proposition" a subclass of "statement", which is fine by me, and allows us to use rdf:subject and friends to describe instances of rdf:Proposition.

I disagree. No matter what the term "statement" means, the term rdf:Statement has a very specific meaning, and an RDF-star triple term is not a specialization of it, but arguably rather the other way round.

@pchampin
Copy link
Contributor Author

No matter what the term "statement" means, the term rdf:Statement has a very specific meaning, and an RDF-star triple term is not a specialization of it,

I stand corrected; re-reading the related sections in RDF Semantics, it says "The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object" (emphasis is mine). I will change my PR accordingly.

also, the algo 'quote-triple-term' was renamed,
because it was not actually "quoting" the triple.
following @afs, the two transformations ('classicize' and revert)
are now primarily described in prose, the algorithm being secundary.

We now also describe the reverse transformation.
@rat10
Copy link

rat10 commented Dec 12, 2024

W.r.t. vocabulary I’m still not convinced: defining a new classicize namespace creates its own issues. I find this more confusing then defining specialized properties like rdf:tripleTermSubject, etc. Also, doesn't this introduces not only 3 new predicates, but also a classicize:TripleTerm in addition to an rdf:TripleTerm?

Copy link
Contributor

@pfps pfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. A bit of wordsmithing might be useful - "it consists .... of ..."

@afs
Copy link
Contributor

afs commented Dec 20, 2024

This looks good now.

@pfps - please could you re-review to remove your "requested changes" that blocks merging?

@rat10
Copy link

rat10 commented Dec 20, 2024

For the record: I don't see the issues I raised adequately addressed. I see no need to push this PR w.rt. avoiding conflicts with other PRs as it is really very well isolated. Questions w.r.t. naming (the properties, basic vs classic) IMHO simply arose because the PR strays from agreed upon terminology and should be rolled back until properly discussed (e.g. by raising an issue about basic vs classic). My comment w.r.t.the most important issue - how triple terms relate to RDF standard refication - hasn't been properly discussed. So: no, I'm still not okay with merging this PR.

spec/index.html Outdated Show resolved Hide resolved
Copy link
Member

@gkellogg gkellogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general sense of this is good, but we should be consistent on graph immutability.

Encoding an [=RDF graph=] to ensure that it is consumable by an RDF [=Classic=] implementation is called <dfn data-lt="classicize|classicized">classicizing</dfn> it.
It consists, while the graph has a [=triple term=] <var>tt</var> in its [=constituent terms=], to mint a fresh [=blank node=] <var>b</var>
(i.e. a blank node not in use in the graph), and replace <var>tt</var> with <var>b</var> in all the triples of the graph having <var>tt</var> in their [=constituents=].
Then the following triples are added to the graph (where <var>s</var>, <var>p</var> and <var>o</var> are respectively the [=subject=], [=predicate=] and [=object=] of <var>tt</var>):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is written considering that graphs are mutable (add or remove something to a graph), while we define a graph as being static:

The RDF data model is atemporal: RDF graphs are static snapshots of information.

Would it not be cleaner to describe the creation of a new graph after applying such transformations to make these operations more functional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is written considering that graphs are mutable (add or remove something to a graph), while we define a graph as being static

Yes, I agree that could be confusing, but there are precedents of speaking of "replacing" nodes in a graph (sec 3.7 and 3.8). And I think that this is a concise and intuitive way to convey what the transformation is.

Note that the algorithm, on the other hand assumes immutable graphs and produces a new graph from scratch.

spec/index.html Outdated Show resolved Hide resolved
@pchampin
Copy link
Contributor Author

Questions w.r.t. naming (the properties, basic vs classic) IMHO simply arose because the PR strays from agreed upon terminology and should be rolled back until properly discussed (e.g. by raising an issue about basic vs classic).

As I explained above, the current version of the spec, on which this PR is based, defined "Full conformance" and "Classic conformance", so the PR sticks to this and refers to "Classic conformance". Using "Basic" in section 8 while section 2 says "Classic" would have been inconsistent and confusing.

Note that I personally don't have a strong preference between "basic" and "classic". It is not that I refuse to make the change, but I don't think that this PR should "casually" change a normatively defined term, when its purpose is somewhere else.

@pchampin
Copy link
Contributor Author

My comment w.r.t.the most important issue - how triple terms relate to RDF standard refication - hasn't been properly discussed. So: no, I'm still not okay with merging this PR.

I just added a note about why we introduce another vocabulary rather than reuse the old reification vocabulary.

For more general discussion about the relation between triple terms and old-style reification, as I responded earlier: I agree that it is needed, but this should be in rdf-semantics and/or rdf-primer. And anyway, this is orthogonal to this PR: the relationship between triple terms and old-style reification is about triple terms in general, not just triple terms being "classicized".

@@ -888,6 +889,12 @@ <h3>Triple Terms</h3>
Every <a>triple</a> whose <a>object</a> is not a <a>triple term</a> SHOULD NOT
use <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#reifies</code> (<code>rdf:reifies</code>)
as its <a>predicate</a>.</p>

<p>The <dfn data-lt="constituent">constituent terms</dfn> (or simply constituents)
of a [=triple term=] (resp. an [=RDF triple=]) are its [=subject=], its [=predicate=], its [=object=],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resp.? This expands in my head to respectively (and minimally requires a following comma, in every instance), but I don't know what the respective reference is.

Abbreviations like this should generally be avoided, especially when, as here, their meaning might be at all confusing.

<p>It defines the <a href="#section-unstar-algo">`unstar`</a> algorithm, which transforms an RDF [=Full=] dataset into an RDF [=Classic=] dataset by encoding all triple-terms into dedicated named graphs.
This algorithm is designed to be:
</p>
<p class=issue>AT RISK: the working group may decide to replace the terms `rdf:TripleTerm`, `rdf:ttSubject`, `rdf:ttPredicate` and `rdf:ttObject` used in this section by other terms, possibly in a different namespace.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p class=issue>AT RISK: the working group may decide to replace the terms `rdf:TripleTerm`, `rdf:ttSubject`, `rdf:ttPredicate` and `rdf:ttObject` used in this section by other terms, possibly in a different namespace.</p>
<p class=issue>AT RISK: The Working Group may decide to replace the terms `rdf:TripleTerm`, `rdf:ttSubject`, `rdf:ttPredicate`, and `rdf:ttObject` used in this section with other terms, possibly in a different namespace.</p>

Note that, on the other hand, the algorithm is not designed to be semantics preserving:
the graphs in the produced dataset are not semantically <a>equivalent</a> to their corresponding graph in the input dataset.
<dd>It must be possible to reconstruct the input graph (resp. dataset) from the output graph (resp. dataset).
Note that, on the other hand, these transformations are not designed to be semantics preserving:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that, on the other hand, these transformations are not designed to be semantics preserving:
Note, however, that these transformations are not designed to preserve semantics:

</dd>
<dt>Idempotent</dt>
<dd>Transforming a dataset that is already complying with RDF [=Classic=] (i.e. containing no <a>triple term</a>) must result in the same dataset.
<dd>Applying a transformation several times to a graph (resp. dataset) should have the same effect as applying it once.
Moreover, [=classicizing=] a graph (resp. dataset) that is already complying with RDF [=Classic=] (i.e. containing no [=triple term=]) must result in the same graph (resp. dataset).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will be well served by using RDF Classic as the defined term, rather than just Classic. I'm not making that change here; only suggesting it to you.

Suggested change
Moreover, [=classicizing=] a graph (resp. dataset) that is already complying with RDF [=Classic=] (i.e. containing no [=triple term=]) must result in the same graph (resp. dataset).
Moreover, [=classicizing=] a graph (resp. dataset) that is already complying with RDF [=Classic=] (i.e., containing no [=triple term=]) must result in the same graph (resp. dataset).

Comment on lines +1499 to +1500
<dd>It should be possible to transform any [=Full=] graph (resp. dataset) to a [=Classic=] graph (resp. dataset) using this method.
There is actually <a href="#section-classicize-caveat">a minor caveat</a> to this property.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<dd>It should be possible to transform any [=Full=] graph (resp. dataset) to a [=Classic=] graph (resp. dataset) using this method.
There is actually <a href="#section-classicize-caveat">a minor caveat</a> to this property.
<dd>It should be possible (with <a href="#section-classicize-caveat">a minor caveat</a>) to transform any [=Full=] graph (resp. dataset) to a [=Classic=] graph (resp. dataset) using this method.

Comment on lines +1509 to +1511
It consists, while the graph has a [=triple term=] <var>tt</var> in its [=constituent terms=], in minting a fresh [=blank node=] <var>b</var>
(i.e. a blank node not in use in the graph), and replace <var>tt</var> with <var>b</var> in all the triples of the graph having <var>tt</var> in their [=constituents=].
Then the following triples are added to the graph (where <var>s</var>, <var>p</var> and <var>o</var> are respectively the [=subject=], [=predicate=] and [=object=] of <var>tt</var>):
Copy link
Member

@TallTed TallTed Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It consists, while the graph has a [=triple term=] <var>tt</var> in its [=constituent terms=], in minting a fresh [=blank node=] <var>b</var>
(i.e. a blank node not in use in the graph), and replace <var>tt</var> with <var>b</var> in all the triples of the graph having <var>tt</var> in their [=constituents=].
Then the following triples are added to the graph (where <var>s</var>, <var>p</var> and <var>o</var> are respectively the [=subject=], [=predicate=] and [=object=] of <var>tt</var>):
[=Classicizing=] consists of repeating the following steps until no [=constituent=] of the graph is a [=triple term=], and the graph is therefore compliant with RDF [=Classic=]: while the graph has a [=triple term=] <var>tt</var> in its [=constituent terms=], of minting a fresh [=blank node=] <var>b</var>
(i.e., a blank node not yet in use in the graph); replacing each <var>tt</var> with <var>b</var> in all the triples of the graph having <var>tt</var> in their [=constituents=];
and then adding the following triples to the graph (where <var>s</var>, <var>p</var>, and <var>o</var> are respectively the [=subject=], [=predicate=] and [=object=] of <var>tt</var>):

<li>(<var>b</var>, `rdf:ttPredicate`, <var>p</var>)
<li>(<var>b</var>, `rdf:ttObject`, <var>o</var>)
</ul>
<p>This process is repeated until the graph has no [=triple term=] [=constituent=], and is therefore compliant with RDF [=Classic=].</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this sentence into the introductory sentence, above.

Suggested change
<p>This process is repeated until the graph has no [=triple term=] [=constituent=], and is therefore compliant with RDF [=Classic=].</p>

<p>Note that this transformation is <em>information preserving</em> only when the input graph does not contain at the same time a [=triple term=]
and an [=asserted=] triple (<var>b</var>, `rdf:type`, `rdf:TripleTerm`) where <var>b</var> is a [=blank node=].
Implementations encountering this situation MUST report an error.
See Section <a href="#section-classicize-caveat"></a> for a discussion on this limitation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See Section <a href="#section-classicize-caveat"></a> for a discussion on this limitation.
This limitation is discussed in Section <a href="#section-classicize-caveat"></a>.


<p class=note>The blank nodes generated to replace [=triple terms=] should not be confused with the [=reifiers=] that are typically associated to these [=triple terms=].</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to link this to the discussion of those "typical associations".

Suggested change
<p class=note>The blank nodes generated to replace [=triple terms=] should not be confused with the [=reifiers=] that are typically associated to these [=triple terms=].</p>
<p class=note>The blank nodes generated to replace [=triple terms=] should not be confused with the [=reifiers=] that are typically associated with these [=triple terms=].</p>

<section id="section-unstar-algo" class="algorithm">
<h2>The `unstar` algorithm</h2>
<p>
[=Classicizing=] an [=RDF dataset=] consists in [=classicizing=] its [=default graph=] and each of its [=named graph=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[=Classicizing=] an [=RDF dataset=] consists in [=classicizing=] its [=default graph=] and each of its [=named graph=].
[=Classicizing=] an [=RDF dataset=] consists of [=classicizing=] its [=default graph=] and each of its [=named graph=].

<p>The algorithm expects one input variable <var>Dᵢ</var> which is an <a>RDF dataset</a>. It returns a [=Classic=] <a>RDF dataset</a>.
In the algorithm, we adopt the view presented in <a href="#section-dataset-quad"></a>.
<p>
See Section <a href="#section-classicize-algo"></a> for a detailed algorithm of the transformation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See Section <a href="#section-classicize-algo"></a> for a detailed algorithm of the transformation.
A detailed algorithm of the transformation is found in Section <a href="#section-classicize-algo"></a>.

<section id="section-classicize-example">
<h2>Example</h2>

<p>The examples in this section are using the Turtle concrete syntax [[RDF12-TURTLE]].</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p>The examples in this section are using the Turtle concrete syntax [[RDF12-TURTLE]].</p>
<p>The examples in this section are expressed in the Turtle concrete syntax [[RDF12-TURTLE]].</p>

Comment on lines +1571 to +1577
<p>Reverting a [=classicize=] graph to its original form consists,
for each [=blank node=] <var>b</var> that is the subject of an [=asserted=] triple (<var>b</var>, `rdf:type`, `rdf:TripleTerm`),
in locating the three other [=asserted=] triples (<var>b</var>, `rdf:ttSubject`, <var>s</var>),
(<var>b</var>, `rdf:ttPredicate`, <var>p</var>),
and (<var>b</var>, `rdf:ttObject`, <var>o</var>).
These four triples are removed from the graph.
All remaining occurrences of <var>b</var> as a [=constituent term=] of the graph are then replaced with the triple term (<var>s</var>, <var>p</var>, <var>o</var>).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p>Reverting a [=classicize=] graph to its original form consists,
for each [=blank node=] <var>b</var> that is the subject of an [=asserted=] triple (<var>b</var>, `rdf:type`, `rdf:TripleTerm`),
in locating the three other [=asserted=] triples (<var>b</var>, `rdf:ttSubject`, <var>s</var>),
(<var>b</var>, `rdf:ttPredicate`, <var>p</var>),
and (<var>b</var>, `rdf:ttObject`, <var>o</var>).
These four triples are removed from the graph.
All remaining occurrences of <var>b</var> as a [=constituent term=] of the graph are then replaced with the triple term (<var>s</var>, <var>p</var>, <var>o</var>).
<p>Reverting a [=classicized=] graph to its original form consists of locating
each [=asserted=] triple (<var>b</var>, `rdf:type`, `rdf:TripleTerm`)
that has a [=blank node=] <var>b</var> as its subject,
along with the three associated [=asserted=] triples
that have the same [=blank node=] <var>b</var> as their subjects, i.e.,
(<var>b</var>, `rdf:ttSubject`, <var>s</var>),
(<var>b</var>, `rdf:ttPredicate`, <var>p</var>),
and (<var>b</var>, `rdf:ttObject`, <var>o</var>);
removing these four triples from the graph;
and replacing all remaining occurrences of <var>b</var>
as a [=constituent term=] of the graph
with the triple term (<var>s</var>, <var>p</var>, <var>o</var>).

Comment on lines +1580 to +1584
<p>Implementations MUST report an error if, for a given <var>b</var>,
it can not unambiguously determine <var>s</var>, <var>p</var> or <var>o</var>
(i.e. if one of the `classicize:` properties of <var>b</var> is missing or duplicated).
Implementations MUST also report an error if the input graph contains at the same time a [=triple term=] and an [=asserted triple=] (<var>b</var>, `rdf:type`, `rdf:TripleTerm`) where <var>b</var> is a [=blank node=].
None of these situations can occur if the input graph was produced by the [=classicize=] transformation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p>Implementations MUST report an error if, for a given <var>b</var>,
it can not unambiguously determine <var>s</var>, <var>p</var> or <var>o</var>
(i.e. if one of the `classicize:` properties of <var>b</var> is missing or duplicated).
Implementations MUST also report an error if the input graph contains at the same time a [=triple term=] and an [=asserted triple=] (<var>b</var>, `rdf:type`, `rdf:TripleTerm`) where <var>b</var> is a [=blank node=].
None of these situations can occur if the input graph was produced by the [=classicize=] transformation.
<p>An implementation MUST report an error if, for a given <var>b</var>,
it can not unambiguously determine <var>s</var>, <var>p</var>, or <var>o</var>
(i.e., if one of the `classicize:` properties
— `rdf:ttSubject`, `rdf:ttPredicate`, or `rdf:ttObject` —
of <var>b</var> is missing or duplicated).
An implementation MUST also report an error if the input graph contains
at the same time a [=triple term=] and an [=asserted triple=]
(<var>b</var>, `rdf:type`, `rdf:TripleTerm`)
where <var>b</var> is the same [=blank node=].
Note that none of these situations can occur if the input graph was produced by the [=classicize=] transformation.

Comment on lines +1599 to +1601
<p>The two transformations above are explicitly not supporting graphs or datasets containing at the same time a [=triple term=]
and an [=asserted triple=] (<var>b</var>, `rdf:type`, `rdf:TripleTerm`) where <var>b</var> is a [=blank node=].
This means, in particular, that the [=classicize=] transformation is not strictly <em>universal</em>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p>The two transformations above are explicitly not supporting graphs or datasets containing at the same time a [=triple term=]
and an [=asserted triple=] (<var>b</var>, `rdf:type`, `rdf:TripleTerm`) where <var>b</var> is a [=blank node=].
This means, in particular, that the [=classicize=] transformation is not strictly <em>universal</em>.
<p>The two transformations above explicitly do not support graphs or datasets containing at the same time a [=triple term=] and an [=asserted triple=]
(<var>b</var>, `rdf:type`, `rdf:TripleTerm`)
where <var>b</var> is the same [=blank node=].
This means that the [=classicize=] transformation is not <em>strictly</em> universal.

Comment on lines 1606 to 1607
as it was not defined prior to this specification.
For this reason, using it would actually have been bad practice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as it was not defined prior to this specification.
For this reason, using it would actually have been bad practice.
as it was not defined prior to this specification,
and, therefore, using it would actually have been bad practice.

For this reason, using it would actually have been bad practice.
As for future datasets, their authors should consider the graph name `rdf:unstarMetadata` to be reserved, in order to prevent interference with the `unstar` algorithm.
As for future graphs and datasets, their authors should consider this type to be reserved, in order to prevent interference with the [=classicize=] transformation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As for future graphs and datasets, their authors should consider this type to be reserved, in order to prevent interference with the [=classicize=] transformation.
For future graphs and datasets, this type should be considered to be reserved for use within the [=classicize=] transformation, and not used otherwise.

Comment on lines +1612 to +1617
This is one of the reasons why this transformation introduces a new vocabulary
(`rdf:TripleTerm`, `rdf:ttSubject`, `rdf:ttPredicate`, `rdf::ttObject`),
rather than repurposing the existing <a data-cite="RDF12-SCHEMA#ch_reificationvocab">reification vocabulary</a>
(`rdf:Statement`, `rdf:subject`, `rdf:predicate`, `rdf:object`).
Contrarily to `rdf:TripleTerm`, `rdf:Statement` is known to used in widely used datasets (e.g. <a href="https://www.uniprot.org/">Uniprot</a>),
so deprecating its usage as "reserved" was not an option.
Copy link
Member

@TallTed TallTed Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to find another example or two, to go with Uniprot.

Suggested change
This is one of the reasons why this transformation introduces a new vocabulary
(`rdf:TripleTerm`, `rdf:ttSubject`, `rdf:ttPredicate`, `rdf::ttObject`),
rather than repurposing the existing <a data-cite="RDF12-SCHEMA#ch_reificationvocab">reification vocabulary</a>
(`rdf:Statement`, `rdf:subject`, `rdf:predicate`, `rdf:object`).
Contrarily to `rdf:TripleTerm`, `rdf:Statement` is known to used in widely used datasets (e.g. <a href="https://www.uniprot.org/">Uniprot</a>),
so deprecating its usage as "reserved" was not an option.
This is one reason why this transformation introduces new vocabulary terms
(`rdf:TripleTerm`, `rdf:ttSubject`, `rdf:ttPredicate`, `rdf::ttObject`),
rather than repurposing the existing <a data-cite="RDF12-SCHEMA#ch_reificationvocab">reification vocabulary</a>
(`rdf:Statement`, `rdf:subject`, `rdf:predicate`, `rdf:object`).
Unlike `rdf:TripleTerm`, `rdf:Statement` is known to be found in
widely used datasets (e.g., <a href="https://www.uniprot.org/">Uniprot</a>),
so reserving its use for the [=classicize=] transformation was not an option.

Comment on lines +1620 to +1627
<p>Another consequence of this restriction is that users should be careful when merging graphs in an application that [=classicize=] graphs or datasets.
More precisely, merging a [=Full=] [=RDF graph=] (containing at least one [=triple term=])
with a [=classicized=] [=RDF graph=] (and therefore potentially containing [=blank node=] instances of `rdf:TripleTerm`)
could result in a "hybrid" graph that can not be transformed.
Such applications should make sure to [=classicize=] every graph prior to merging them.
Conversely, applications supporting RDF [=Full=] should make sure to apply the reverse tranformation to any graph that is known or likely to be [=classicized=],
to avoid creating such "hybrid" graphs.
Since these transformations are <em>idempotent</em>, there is no harm in applying them more than necessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its troubling that this caution was written to users and then later, to applications, rather than implementers who ought to bear the burden. Further editing will be good. Also, this segment --

      Therefor, such applications should [=classicize=] every graph prior to merging them.
      Conversely, applications supporting RDF [=Full=] should make sure to apply the reverse transformation
      to any graph that is known or likely to have been [=classicized=],
      to avoid creating such "hybrid" graphs.

— would benefit from rewriting to make it a single caution that covers both [=Classic=] applications classicizing, and [=Full=] applications "de-classicing" (there must be a better word! "the reverse transformation" doesn't cut it...), every graph

Suggested change
<p>Another consequence of this restriction is that users should be careful when merging graphs in an application that [=classicize=] graphs or datasets.
More precisely, merging a [=Full=] [=RDF graph=] (containing at least one [=triple term=])
with a [=classicized=] [=RDF graph=] (and therefore potentially containing [=blank node=] instances of `rdf:TripleTerm`)
could result in a "hybrid" graph that can not be transformed.
Such applications should make sure to [=classicize=] every graph prior to merging them.
Conversely, applications supporting RDF [=Full=] should make sure to apply the reverse tranformation to any graph that is known or likely to be [=classicized=],
to avoid creating such "hybrid" graphs.
Since these transformations are <em>idempotent</em>, there is no harm in applying them more than necessary.
<p>Another consequence of this restriction is that users will need to be aware and careful when merging graphs in an application that [=classicizes=] graphs or datasets.
The concern is that merging a [=Full=] [=RDF graph=] containing at least one [=triple term=]
with a [=classicized=] [=RDF graph=] (which might contain [=blank node=] instances of `rdf:TripleTerm`)
could result in a "hybrid" graph that cannot be transformed to a consistent [=Full=] nor [=Classic=] [=RDF graph=].
Therefor, such applications should [=classicize=] every graph prior to merging them.
Conversely, applications supporting RDF [=Full=] should make sure to apply the reverse transformation
to any graph that is known or likely to have been [=classicized=],
to avoid creating such "hybrid" graphs.
Since these transformations are designed to be <em>idempotent</em>, there is no harm in applying them more than necessary.

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed spec/index.html up to the Algorithms. I'll come back for that. There are a number of requested changes above. I didn't think I'd find so many, nor that I'd have this much time to do so, or I'd have bundled them into a review... Sorry for the extra clicks these will take to apply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants