Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dataset formats to be valid in LOAD with no INTO #41

Closed
afs opened this issue Aug 28, 2024 · 16 comments · Fixed by #46
Closed

Allow dataset formats to be valid in LOAD with no INTO #41

afs opened this issue Aug 28, 2024 · 16 comments · Fixed by #46
Labels
Errata Errata management: confirmed erratum spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial

Comments

@afs
Copy link
Contributor

afs commented Aug 28, 2024

Formats: TriG, N-Quads, JSON-LD.

@afs
Copy link
Contributor Author

afs commented Aug 28, 2024

This is following through on the RDF 1.1 Concept definition of RDF Dataset allowing blank nodes for graph names.

See also w3c/sparql-query#152

@afs afs added needs discussion Proposed for discussion in an upcoming meeting spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial labels Aug 28, 2024
@Tpt
Copy link
Contributor

Tpt commented Aug 28, 2024

JSON-LD is often used as a graph-only format. Maybe we can allow INTO for all formats stating that INTO maps the source default graph into the graph given by INTO.

@afs
Copy link
Contributor Author

afs commented Aug 28, 2024

Yes, that seems natural.

All the formats allow a graph-only subset.

The one I come across sometimes is nquads as a dataset dump and the dataset only has a default graph.

@pfps
Copy link
Contributor

pfps commented Nov 13, 2024

What is supposed to happen in this case?

Are dataset formats acceptable if there is an INTO?

@afs
Copy link
Contributor Author

afs commented Nov 13, 2024

Are dataset formats acceptable if there is an INTO?

This issue is "with no INTO".

I now think it needs to discussed and considered separately.

@afs
Copy link
Contributor Author

afs commented Nov 13, 2024

JSON-LD is often used as a graph-only format. Maybe we can allow INTO for all formats stating that INTO maps the source default graph into the graph given by INTO.

Yes, that seems natural.

On reconsideration, that INTO case is unfortunate. What happens if that JSON-LD does contains both default graph and named graphs? This can't be determined ahead of time.

We can treat it as a separate issue (if anyone wants to raise it).

For this issue, we can make progress on the "without INTO" case.

@kasei
Copy link

kasei commented Nov 13, 2024

On reconsideration, that INTO case is unfortunate. What happens if that JSON-LD does contains both default graph and named graphs? This can't be determined ahead of time.

I think INTO should only work with triple formats, and we should consider using WITH together with LOAD to adjust which graph triples load by default when loading a dataset.

(As Andy says, the latter part potentially as a separate issue.)

@afs
Copy link
Contributor Author

afs commented Nov 21, 2024

This could be viewed as "errata" because LOAD talks about "RDF Document" which is defined in RDF Concepts as includeing dataset formats.

@afs afs added the Errata Errata management: confirmed erratum label Nov 26, 2024
@afs
Copy link
Contributor Author

afs commented Nov 26, 2024

See also PR #46.

A behavior when encountering quadswhen loading "INTO" could be to take only the default graph from the RDF document/dataset.

I don't think we should require that or give it much weight but there is an issue when loading a large document only to find quads after a lot of work has been done. Easy if fully ACID-transactional-serialized but ideal behaviour can be hard/costly to provide otherwise.

We could give this behavior some status by making it a MAY (with a require to warn that quads are being dropped) or describe as a note.

@Tpt
Copy link
Contributor

Tpt commented Nov 26, 2024

+1 to this behavior.

An other possible behavior is to load the default graph from the RDF document/dataset into the named graph specified by INTO and load the named graphs into the named graphs with the same name. This way here is no error case but it might lead to weird behaviors like merge of named graphs. Hence, #46 suggestion of erroring is likely better and does not prevent a future WG to change this behavior to something different.

@w3cbot
Copy link

w3cbot commented Dec 5, 2024

This was discussed during the #rdf-star meeting on 05 December 2024.

View the transcript

Allow dataset formats to be valid in LOAD with no INTO 4

w3c/sparql-update#46

<gb> Pull Request 46 LOAD RDF document clarification (description section and definition section) (by afs)

AndyS: At the time SPARQL Update was written, it worked only on graphs. It's undefined on how to load documents that describe datasets.
… The PR says that LOAD <doc> loads to the dataset, so that quads in TriG would create a SPARQL dataset.
… There are more complicated things, such are re-mapping graph names, but that is not covered in this proposal.
… Some formats it is difficult to know if it encodes a graph or a dataset (e.g., JSON-LD).
… It says you need to generate an error if there are any quads, otherwise, it would go into the target graph.

ora: If I were loading N-Quads, and it specified the graph to load into, would it be an error

AndyS: Yes, that would be an error. If there is no graph name, it would go into the target graph.

pchampin: Could we say that the "INTO" is where default triples go? And other triples go into their specified graph.
… (Verified Credentials use cases).
… Blank node graph names.

AndyS: There are systems that would just use the default data, and they would not be compliant.
… The PR is just addressing the case on when there is no "INTO" clause.
… Use cases with blank nodes are interesting, as are renaming use cases, but that is not the target of this PR.
… Please raise issues for other use cases.
… So, LOAD <...> loads the dataset.

james: How does this align with the graph-store protocol, which is also underspecified. Can they be aligned?
… Having a matrix for the variants would be helpful information, even if incomplete.
… It's hard to understand the affect as is.

AndyS: The graph-store protocol is not quite the same thing, as you need to explicitly name the target.
… There is another protocol for loading quads into a dataset: HTTP.
… You can argue that that is loading the dataset.
… We could add an informational note about this, but right now, I suggest we focus on the LOAD use case.

james: It's odd that that the graph-store protocol is not sufficient. I think it's inability to handle other graphs as an erratum.
… I consider the issues to be the same, they both have to do with quads going into a dataset.

ora: Nothing prevents us from fixing the graph-store protocol from being inline with this. Do you object to fixing this right now?

james: I'd like to know specifically where we're going to understand the expected behavior, as these are interrelated.
… I'd like to see a more transparent description of what should happen.

AndyS: Perhaps james can read the definitional part of Op Load. Please make a separate proposal, but people actually need to work on it.

ora: I suggest we fix LOAD now and consider a matrix approach in the future.

PROPOSAL: Continue with #41

<gb> Issue 41 Allow dataset formats to be valid in LOAD with no INTO (by afs) [Errata] [needs discussion] [spec:enhancement]

<ktk> +1

<niklasl> +1

<gtw> +1

<gkellogg> +1

<pchampin> +1

<ora> +1

<AndyS> +1

<james> +1

<Souri> +1

<Tpt> +1

<william_vw> +1

<tl> +1

<olaf> +1

<AZ> +1

<TallTed> +1

<doerthe> +1

<eBremer> +1

RESOLUTION: Continue with #41

<gb> Issue 41 Allow dataset formats to be valid in LOAD with no INTO (by afs) [Errata] [needs discussion] [spec:enhancement]

james: I will add a matrix to the issue to see if I have understood correctly.

<gb> Issue 130 vocabulary to refer to the individual nodes in a triple term (by rat10) [discuss-f2f]

<gb> Issue 130 vocabulary to refer to the individual nodes in a triple term (by rat10) [discuss-f2f]

<Zakim> tl, you wanted to ask about status label of issue #130 <w3c/rdf-star-wg#130>


@afs
Copy link
Contributor Author

afs commented Dec 5, 2024

WG Resolution from the meeting of 2024-12-05:

RESOLUTION: Continue with #41.

@afs afs removed the needs discussion Proposed for discussion in an upcoming meeting label Dec 5, 2024
@lisp
Copy link

lisp commented Dec 12, 2024

Here are two tables which indicate the disposition for content included in each category of
the combinations between

operation graph
This concerns the graph specified by the operation. That would be a protocol graph present in a graph store protocol url or the graph designator in a SPARQL load into clause.
content graph
This is the graph present in the content. It applies with quad documents, uch as application/n-quads or applicaiton/trix

In the first table, one applies the logic that the latest specification for the value should supersede any earlier specification and follows the control flow model, that the content creation precedes the operation, which leads to the following effects.

operation graph designatorcontent typeeffective graph
- n-triples, rdf+xml PATCH: default
POST: generated
PUT: default
- n-quad, trix statement
default n-triples, rdf+xml default
default n-quads, trix default
graph=protocol n-triples, rdf+xml protocol
graph=protocol n-quads, trix protocol
protocol GSP direct graph n-triples, rdf+xml protocol
protocol GSP direct graph n-quads, trix protocol

The pull request suggest a logic which rejects over-constrained operations.

operation graph designatorcontent typeeffective graph
- n-triples, rdf+xml PATCH: default
POST: generated
PUT: default
LOAD: default
- n-quad, trix statement
default n-triples, rdf+xml default
default n-quads, trix error
graph=protocol n-triples, rdf+xml protocol
graph=protocol n-quads, trix error
protocol GSP direct graph n-triples, rdf+xml protocol
protocol GSP direct graph n-quads, trix error

@afs
Copy link
Contributor Author

afs commented Dec 12, 2024

The comment above relates to w3c/sparql-graph-store-protocol#24.

@afs afs closed this as completed in #46 Dec 12, 2024
@lisp
Copy link

lisp commented Dec 12, 2024

yes, it does relate to that, but it is not intended to be restricted to that.
note the definition of "operation graph".
the goal is to suggest that the two recommendations should share a common logic.

@afs
Copy link
Contributor Author

afs commented Dec 12, 2024

There is more material in GSP (Graph Store Protocol) related to the topic:

e.g.
https://www.w3.org/TR/sparql12-graph-store-protocol/#graph-management
https://www.w3.org/TR/sparql12-graph-store-protocol/#indirect-graph-identification

GSP uses SPARQL Update (and Query) to define its operation in some places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Errata Errata management: confirmed erratum spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants