-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which parties carry what costs of text/turtle changes, and do those outweigh which benefits for whom? #141
Comments
Thank you for the analysis. We do have https://www.w3.org/TR/rdf12-turtle/#changes-12 and the matter will be in "RDF 1.2 New". The WG is discussing levels of conformance. |
Related: https://www.w3.org/TR/rdf12-turtle/ -- currently, the 1.2 working draft. This will be the REC when published. Title "RDF 1.2 Turtle". |
Interesting; may I suggest a standards-based mechanism for agents to indicate this level? (A media type or profile comes to mind.) Or would “classic conformance” de facto be the same as effectively only parsing the RDF 1.1 subset (in which case it would be equivalent to one of the points above)? However, this does not seem to be the case, with for instance base directions being added to literals (in which case “classic” might be a confusing/misleading term). |
[This is not a WG response] Any approach for versioning can have costs on both the reader-side and the writer-side. For example, anything in the HTTP header to make the data consumer's task easier puts a requirement on the data producer. In the same way that RDF 1.2 syntax can be a long way in the delivered stream for a reader, having an HTTP header with the information makes the writers life harder because it may need to see all the data first - no stream writing without recording the details of which version is in the data, which would also be a producer-side burden. One way to publish data is using a web server support for mapping file extensions to Today, a tool kit may need to "know" to look in the URL to get the content type if no realistic content-type is available. Given the file extension situation, I think any solution will not help RDF that much. Software will want to handle the static/non-profile/file-extension/... cases anyway. Only a domain specific (i.e. consumer and producer) deployment can be sure the global rules are in-play. There is a trade-off of whether the long term continued provision of a migration solution is a greater burden than the evolution itself. Such migration should never be withdrawn -- "The web is not versioned". |
Thanks, @afs. I want to leave space for others so will be brief, but quickly:
|
Maybe an optional version or feature declaration, to support fail-fast detection? With the implicit being "latest REC". It should perhaps be clearly stated that implementations are required to follow the evolution of the format; with the reciprocal requirement of evolving the format responsibly, aspiring to standardize once "sufficient" implementation coverage has been established. AFAIK, there is a requirement of multiple independent implementations; perhaps that number should be a function of the "cardinality of known deployments" and "how viable it is to upgrade them"? (I know it is a practical impossibility to quantify that on the web scale, but it goes to show awareness of the complexity underlying these judgement calls. And that we (W3C members) have a responsibility to care and cater for cooperative evolution to ensure web interop.) I think this follows the conventions @afs referenced, which is a trade-off I'm cautiously in agreement with. Defining a new format (mime-type + suffix) is the only other viable option AFAICS; and while that caters for more overlap in deployments, it also induces a certain inertia and growing technical debt. (When is the previous format "sunset"? How is the data quality impacted during the overlap period? How do applications take the difference in expressivity into account?) I see no practical way around some form of social contracts, as even content negotiation is not merely technical ( |
The key difference being that—for example—HTTP, HTML, and CSS have explicit behaviors on how to deal with unsupported constructs. HTTP proxies have rules on how to deal with unknown headers, HTTP has version negotiation, HTML has rules for unknown tags and attributes, CSS has rules for unsupported properties and even syntax. So the Web's ability to be non-versioned is baked into the design of those technologies. Conversely, RDF adopting the non-versioned philosophy does not equate doing nothing on the feature support/versioning front, but rather being very explicit about how non-versioning is to be made possible. In summary, not doing anything put us on neither a versioned nor a non-versioned trajectory. They are not binary opposites, with the third option “incompatible with versioning and non-versioning” being the unfortunate default choice. |
to take a concrete example as precedent, i do not recall that in the transition from sparql 1.0 to sparql 1.1 was the continued use of the same media type designators was problematic. in what sense other than the concern about "late failure" for large documents should that matter for document media types? |
Apples and oranges.
So the upgrade path of SPARQL is much more similar to that of SQL, with similar challenges and non-issues. And quite a pain in practice: one typically needs to know out-of-band what precise SPARQL endpoint software an interface is running, which determines how well certain SPARQL 1.0 or 1.1 features are supported. In contrast, at least today,
RDF is about enabling interoperability. Yes, on the semantic level, but not having interoperability on the syntactical level precludes that. In the pre-1.1 days, “Turtle” had been around as a format for over a decade, and parsers were incompatible with each other. It was quite the nightmare, trying to exchange data or write parsers. There was no established (let alone standard) way of knowing what subset was supported by everyone. The Turtle standard solved this by bringing certainty about what is and isn't The proposed re-definition of
One might not even know. One could've parsed a 1.2 document wrongly without ever knowing. One could've rejected or accepted a document based on the wrong assumption (because assumptions are all you have, in band). One doesn't know if downstream systems are compatible with 1.1 or 1.2, because they can't tell. It's an absolute interoperability nightmare that systems don't even have the words to express what they do and do not support. In a context where we're advocating for semantic interoperability, failing at syntactic interoperability is a serious flaw from a technical and strategic perspective. It adds a serious degree of brittleness, the details of which only a small group of people understand, which carries a major risk of reflecting badly on RDF as a whole for not being a sustainable—let alone interoperable—technology. People will say that RDF doesn't work reliably across systems, and they will be right. |
that may be, unless one is concerned with sparql processors. |
we agree - vehemently. |
2 cents from someone who did implement a non-standard RDF format that has an analogue of Ruben's proposed I ended up making the serializer always claim that all features are used by default. Then, it's up to the user to tell the serializer that "this and that" feature won't be needed. This creates an obvious compatibility problem, because parsers will simply refuse to read these files, even though in practice the feature may not be used. I have not found a better solution to this problem. I think this is a sensible compromise for my ugly format, but I would be against this in W3C formats. More details here. Overall, I think a sensible solution would be to embrace the mess and just live with the fact that RDF formats can evolve. I would also like to ask the WG to kindly consider producing some "best practices" for how to mark that an RDF file is 1.2, in a use case-specific manner. I like the suggestion from @lisp for adding some info in graph store protocol descriptions. I'm also curious if something like a non-mandatory HTTP header would be an option. Or maybe a comment at the start of the file (like a shebang in .sh files) – of course, entirely optional. (disclaimer: I did not think these ideas through, they may be VERY bad) |
Intuitively to me it sounds like TTL documents that use any of the new features need a new media type and file ending. |
legacy software will not see them. |
Isn't the situation with Turtle 1.1 and Turtle 1.2 a bit like Turtle and TriG? In both cases the former syntax is a subset of the latter. |
Consuming data which is suddenly |
Hi. My thoughts on this from a practicality perspective: I echo Ruben's argument that we should be aiming to support interoperability and backwards compatibility - especially when we know exactly how and why an existing system will break due to new changes. For Turtle, the mime type can be versioned - there is precedent for this if we look at existing mime types. If we don't version the mime type, existing systems will break. They will need to be updated to support turtle 1.2. There is no way to distinguish between turtle 1.1 and turtle 1.2, so there is no way for them to silently fail or ignore turtle 1.2. There isn't also a way to fail with context i.e. failed as it doesn't handle turtle 1.2 - it will fail equally for valid turtle 1.2 and invalid turtle 1.1. So this is not a trivially fixable change. Not desirable IMO. If we do version the mime type, existing systems will not break. If they have to support turtle 1.2, then they MUST change or be updated anyway since turtle 1.2 requires updates anyway, and hence there is an opportunity for these systems to add the mime type handling change alongside the turtle 1.2 handling changes. It might result in some extra work, potentially some complex cases as there is mime type handling. However, we know for sure that existing systems won't break (assuming the mime type is used as intended here), and if they do get an incorrectly assigned mime type then the fix is to use the correct mime type. So this should be the desirable state. This also brings up the question of what should happen when Turtle 1.3 eventually is required. Again versioning the mime type is an option, but pragmatically, having the version in the document itself is the best forwards-compatible solution and a known best practice. It would be ideal to have it here. |
Another important consideration to take into account here is the length of Long accept headers in browsers are problematicThe Fetch spec (CORS section) specifies that each header (including the As an example, the Comunica query engine uses the following
Hence, when we do these requests in a browser, we must splice this New media types exacerbate this problemAs such, I believe introducing new media types for each RDF serialization in 1.2 is not the right way forward. For example, the following (which contains some arbitrary new media types for 1.2) already reaches the limit according to CORS:
And this problem would only get worse for every new RDF version:
Towards a solutionMy initial thought when reading this issue was that profile-based negotiation could be a good solution, From this perspective, my feeling is that new media types or profile-based negotiation are not the way to go, and that in-band solutions such as Not only does this problem apply to RDF serialization, it also applies to SPARQL result serializations: SPARQL/JSON, SPARQL/XML, SPARQL/CSV, SPARQL/TSV. |
Except that again, no established frameworks (e.g. JAX-RS implementations) support it. |
which is why it is better to implement the logic which verifies availability of the required media type on a higher level. |
While that's true for the spec version, I don't think the same can be said for the widespread use of the Team Submission that predates the spec. The same media type was in use for years before Turtle 1.1 was introduced and brought with it changes to the syntax. I'm not sure that's reason to do the same thing again, but this isn't the first time we've been faced with this issue. |
Just felt like pointing out that there’s also an IETF Internet Draft on profile based negotiation, of which @RubenVerborgh is co-author. It’s been in the works for quite a long time. There’s been renewed interest from the cultural heritage community and even from the W3C where some consider this a topic that falls in the IETF realm. See https://datatracker.ietf.org/doc/draft-svensson-profiled-representations/. |
Technologically? Been there, done that. Reputationally? Not so much. |
@RubenVerborgh — You pointed to "Extended discussion at https://ruben.verborgh.org/articles/fine-grained-content-negotiation/" — which included —
First thing, your writing betrays a limited understanding of your topic, as you refer consistently to "MIME types", which are actually "media types", though they are used in a universe of MIME. Next, I bear relatively recent scars of a years-long effort to convince IETF to follow their own documentation and work with a number of folks (including me) who wanted to extend media types by defining how to interpret multiple In other words -- your "extended discussion" (which is really an extended monologue) has been overtaken by events, and is no longer (if it ever was) applicable. |
@TallTed questioning Ruben V's credibility with silly 'tomayto, tomahto' comments is not relevant, nor helpful. That said, valid point about I haven't seen any syntax change on the Web without a change in the media type. While I find @rubensworks' concerns valid, I don't think there's a way around it. Clients will have to be more picky about what content-type they request and eventually, either |
I am not sure this is true. For example XML 1.0 and 1.1 share the same media type. Similarly HTML, CSS and JavaScript had significant syntax changes (especially the later 2) without media type. On the RDF-related elements, JSON-LD 1.1 changed the default processing mode without media type change, SPARQL got the very large 1.1 update... |
I don't think I made any @RubenVerborgh pointed to an 8 year old article he had written, as if it had some greater authority behind it than himself, and called it a "discussion", which would usually mean that it involved multiple participants. Indeed, the article page says "This Linked Research article has been peer-reviewed and accepted for the Workshop on Smart Descriptions & Smarter Vocabularies (SDSVoc) following the Call for Papers." If you follow the link to that Call for Papers, you can see that (emphasis mine) "Short position papers are required in order to participate in this workshop. These are not academic papers but descriptions of the problem you’d like the workshop to discuss and the presentation you would like to offer. ‘Papers’ can be as simple as a short description of a tool or service to be demonstrated and the technologies used. Each organization or individual wishing to participate must submit a position paper explaining their interest in the workshop by the deadline. The intention is to make sure that participants have an active interest in the area, and that the workshop will benefit from their presence." So, far from being "peer-reviewed", the only authority behind that article is @RubenVerborgh himself, and the "discussion" consists of only 2 comments (made 7 and 6 years ago) from other people (respectively, @VladimirAlexiev, who wanted to clarify the meaning of 1 sentence, and @nicholascar, who pointed to a document then-in-progress which has since moved to Content Negotiation by Profile; W3C Working Draft, 02 October 2023), of which only @VladimirAlexiev's is really about the content of the article, and that only about one phrase in one sentence, to which @RubenVerborgh made a one sentence reply, which did not result in any clarifying change within the article. I stand by my assessment of the article, and of profile-based content negotiation. |
See above; HTML, XML etc. have explicitly created in-document mechanisms for versioning, which is why they can do this. Turtle does not, which is why we have the problem. (N3, in contrast, was explicitly equipped for this reason with an
@TallTed Let's keep this thread about tech to save people's inboxes; feel free to send other comments to mine. |
RDF 1.2 does not change the vast majority of RDF data. Triple terms and base direction will be uncommon. My concern is that we end up "splitting the world" into "RDF 1.1" and "RDF 1.2" yet all RDF 1.1 is valid RDF 1.2 and very often data from RDF 1.2 publisher is valid RDF 1.1. It is a cost-benefit decision. JSON-LD didn't change the media type - it was 1.0 compatible (nearly) and did introduce optional Profiles do have a role in asking a RDF 1.2 data publisher that does use triple terms to some extend to return RDF 1.1 compatible unstar data. |
+1 to @afs
Indeed XML have a versioning mechanism. However some examples I took like JavaScript , SPARQL and I believe CSS do not have versioning and syntax introduced in recent versions are strong syntax errors in the previous ones. There are definitely syntax changes on the web without media type changes. |
…and never without breakage, which was the initial point: I expect format-based breakage to cause fatal technical and reputational damage to an ecosystem supposedly defined by interoperability. |
The JSON-LD WG discussed this in the 2024-12-11 meeting and recorded as discussion on w3c/json-ld-syntax#436. The feeling is that there are many instances where the content served by different media-types may be changed, and changing it to something else is highly disruptive, to the point of constraining any real adoption of the new versions.
JSON-LD does have a version announcement feature ( w3c/rdf-xml#49 (comment) suggests an |
I don't mean to hog the thread, but we're all increasingly talking past each other here. Lots of true statements above:
None of that was ever under question originally, nor does it answer the matter at hand. So let me make my own question that started this thread more explicit and specific:
|
Sorry, should have been more nuanced (I meant: changes that cause old parsers to break and not ignore) and you are right. It's what Ruben says: both stategies have been used and probably, the character limit on HTTP headers was a major driver for those not changing. BTW, now that you mention the JSON-LD approach: in that line of reasoning, turtle/trig, of ntriples/nquads might as well have been one syntax with a versioning construct. After processing what has been said in this thread, my two cents:
I think the benefits would be limited. If we don't change, some (of most) parsers will break...temporarily. If they are still being used and maintained, someone will provide a fix. Else, it's maybe not that big of a problem. If we do consider this as a huge problem (I don't think it is), you can only do this with a media-type change (this was my previous point).
Parsers will able to anticipate or mitigate turtle 1.2 documents without additional changes in protocol or practice and will be able to explain to the user why it's not working, but only for online documents. I wonder whether
The developers of the parsers, consumers of RDF and publishers of RDF both receive costs and benefits. And mostly those operating in a Web context. There's a lot of offline processing too, and that's where a
I don't know, but in my practice, I almost never rely on the RDF mediatypes because I don't process data published on the Web that I don't somehow control.
Probably, but it will be less of a problem because of the 1.2 versioning system that results from this discussion :)
No. I don't think this will cause a reputation problem; it didn't for Javascript, CSS or HTML and much can be prevented by thouroughly communicating the spec beforehand. Web standards think (long), specify, formalise, document, and communicate, and therefore allow for much more migration time than any other format. |
They might, though. If DBPedia migrates to an RDF 1.2 system, and there is a new media type for Turtle 1.2, then it is likely that they will now serve their content wit this new media type (see @Ostrzyciel's comment above). Then old client will suddenly become unable to consume DBPedia data – even if the data has not changed a bit, and is still effectively Turtle 1.1.
Not necessarily. Following my DBPedia example above: since most of the data will still remain RDF 1.1 compatible, then old client won't even notice the migration. Of course, I'm not claiming that keeping the same media-type is free of problems. Actually I'm not sure which option I prefer... |
Assumption that most data will remain 1.1 compatible may not be valid if we do heavy use of 1.2 in the future. My argument for mime types is better for existing 'old clients' that won't be updated any time soon because they won't be served 1.2 content that they cannot handle. If DBPedia or other systems migrate to 1.2 and stop serving 1.1 content, then hopefully it will be an informed decision. This shouldn't be the argument to not provide a way to distinguish between clients. Also the counter-argument should be stated here - if there is no distinction between mime-types and DBPedia starts serving 1.2 content - the old clients are going to run into errors anyway. At least with the difference in mime-type we have 'control' over the failing conditions, and a way to continue to run old clients with the old mime-type. |
Well, Turtle 1.1 consumers that rely on some default server behavior might choke on such new Turtle 1.2 serializations that were previously delivered as Turtle 1.1, if this new Turtle 1.2 is made the default and/or the media type remains But I would expect that Turtle 1.1 consumers that correctly use ConNeg and the (I don't think the WG can retroactively shoehorn version info into Turtle documents, nor some other serializations of RDF 1.2, because those serializations were specified as if there would be no changes against which to future proof. I consider this a major error, but it is where we are.) I'm confident that this will be how things work, because DBpedia is hosted by Virtuoso, and Virtuoso supports nearly if not all RDF serializations. Turtle 1.2 is not yet available, because the WG hasn't finished specifying it, but I expect Virtuoso to natively handle RDF 1.2 and SPARQL 1.2 (possibly with a new I don't think the WG has yet considered the full matrix of permutations, of what will happen when mixing tools conforming to each version of RDF and of SPARQL, with data conforming to each version of RDF and queries conforming to each version of SPARQL. I don't think the WG can conclusively state now whether our new specifications will include some breaking or only non-breaking changes, and thus be 2.0 or 1.2. I think the WG can conclusively state the WG is trying to produce non-breaking changes to 1.2 (as the WG is Chartered to do). |
Naive question: can't we define something - and even if it's only a standardized comment like |
@rat10 This is from my note on what I see it as: If we do add a magic-first-line but don't change the mime-type then 1.2 parsers have a way to identify whether they are looking at 1.2 or 1.x document, but previous parsers won't understand this and as a result will mark the document as invalid Turtle. Since there is no mechanism in turtle to indicate version or dictate how parsers should check for version compatibility, any solution which relies purely on addition of fields/info will break existing parsers. So AFAIK the options are:
Whether 1 or 2 is preferred depends on how one does the cost-value analysis. Both will require changing parsers anyway to handle 1.2. Approach 1 is better considering we don't need to make changes to the stack all the way from content-negotiation / mime-handling bits upwards. Approach 2 is better as it doesn't have a 'penalty' for not changing existing codebases as they won't be given 1.2 with the same mime-type (and they will reject the new mime-type). My preference is for Approach 2, because even if we take Approach 1 right now, 5-10 years down the line we're going to have this same problem again when there will be a version 1.3 which will again break the then current parsers etc. - so I would prefer not breaking any stuff and updating things now rather than break them now and again at every version change. To do this, we would need a new mime-type e.g. |
I'm not claiming any authority w.r.t. this issue but let me just add that
I.E. such a comment would at least make failures less opaque. And failures seem to be unavoidable anyway. OTOH the comment line by itself should not break anything. |
In libraries like ANTLR, comments are usually ignored in the abstract syntax tree (AST) by defining them as HIDDEN tokens. This is because comments do not affect the language's syntax, including them would unnecessarily increase AST complexity, and removing them simplifies further code processing. In ANTLR, comments can be defined as hidden tokens:
|
Summary
In rdfjs/N3.js#484, I learned that the specifications intend to redefine the set of valid documents under the
text/turtle
media type (and presumably others).Such a change might not be possible/desired, or should at least be acknowledged as a breaking change, with a resulting cost/benefit analysis.
Definitions
text/turtle
as the media type defined by https://www.w3.org/TR/turtle/valid-turtle
as the (infinite) set of valid Turtle 1.1 documentsinvalid-turtle
as the (infinite) set of documents that are not invalid-turtle
valid-turtle
, produces the corresponding set of triplesinvalid-turtle
, rejects it (possibly with details on the syntax error)Note here that the above definition includes rejection; the 1.1 specification text does not, its test cases do.
Potential problems
text/turtle
breaks existing spec-compliant Turtle parsers, as they will incorrectly label validtext/turtle
documents as invalid.text/turtle
document and no other context?text/turtle
in the Turtle 1.1 spec, any changes to that set (whether deletions or additions) would contradict the Turtle 1.1 spec itself / make it invalid.Accept: text/turtle
does not tell them. Nor doesContent-Type: text/turtle
tell them whether their parser can handle the contents, and we could be 20 gigabytes in until we notice it doesn't.Analysis
Unlike formats like HTML, Turtle 1.1 does not contain provisions for upgrading. The specification assumes a closed set of valid documents. We find further evidence in a number of bad test cases (https://www.w3.org/2013/TurtleTests/), which explicitly consider more permissive parsers to be non-compliant.
There is a note in the spec (but only a note, and thus explicitly non-normative):
but this non-normative statement is contradicted by the bad test cases, which parsers need to reject in order to produce a compliant report.
Although the considered changes for 1.2 are presumably not in contradiction with those bad cases, the test suite was not designed to be exhaustive. Rather, the 1.1 specification considers
text/turtle
to be a closed set, and the test cases consider a handful of examples to verify the set is indeed closed.In particular, no extension points where left open on purpose.
Therefore, the 1.1 spec is not only defining “Turtle 1.1”, but also strictly finalizing
text/turtle
.(The IANA submission's reservation that "The W3C reserves change control over this specifications [sic]." does not change the above arguments.)
Potential solutions
A set of non-mutually exclusive solutions, which each cover part or all of the problem space:
Factual disagreements with the above.
The introduction of a new media type.
The introduction of a new profile on top of the existing
text/turtle
media type.A change to the Turtle 1.1 spec that adds extension points or otherwise opens the set of
text/turtle
.Syntactical support in Turtle 1.2 for extension and/or versioning.
The text was updated successfully, but these errors were encountered: