-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple significant security vulnerabilities in the design of data integrity #272
Comments
I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs. Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism. We might want to add some examples to the specification that show that the information in documents can be expressed in one context and transformed into another. This could include showing an incoming document that is expressed using one or more contexts that the consumer does not understand, which can then be transformed using the JSON-LD API to a context that is trusted and understood. This would also help highlight the power of protection mechanisms that enable this kind of transformation. For example, a VC that includes terms that are commonly consumed across many countries and some that are region specific. By using the JSON-LD API, a consumer that only understands the global-only terms can apply such a context to ensure that the terms they understood will appear as desired and other region-specific terms are expressed as full URLs, even when they do not understand or trust the regional context. All of this can happen without losing the ability to check the authenticity of the document. We can also highlight that simpler consumers continue to be free to outright reject documents that are not already presented in the context that they trust and understand, no matter their authenticity. |
The fundamental point of digital signatures is to reduce the information that needs to be trusted prior to verification. Most modern technologies e.g SD-JWT, mDocs, JWT and COSE and JOSE at large do this successfully meaning a relying party only needs to trust a public key prior to attempting to verify the signature of an otherwise untrusted payload. If the signature check fails, the payload can be safely discarded without undue expense. The problem with data integrity is that this assumption is not the same. In essence the relying party doesn't just need the public key of the issuer/signer, but also all possible JSON-LD context entries that issuer may or may not use, if any of these are corrupted, manipulated or untrusted ones injected, the attacks highlighted in this issue become possible. Whether it is even possible to share these contexts appropriately at scale is another question, but these attacks demonstrate at a minimum that an entirely unique class of vulnerabilities exist because of this design choice.
The point im making is not about whether one should understand the context of a message it has received or not, its about when it should attempt to establish this context. Doing this prior to validating the signature is dangerous and leads to these vulnerabilities. For instance a JSON-LD document can be signed with a plain old JWS signature (like in JOSE COSE), once the signature is validated one can then process it as JSON-LD to understand the full context, if they so wish. The benefit of this approach is that if the JSON-LD context have been manipulated (e.g the context of the message), the relying party will have safely discarded the message before even reaching this point because the signature check will have failed. Data integrity on the other hand requires this context validation to happen as a part of signature verification thus leading to these issues. |
Another take on this is that Data Integrity signing methods that sign the canonicalized RDF derived from JSON-LD, rather than the JSON-LD itself, enable multiple different JSON-LD inputs to canonicalize to the same RDF. The JSON-LD itself isn't secured - only RDF values derived from it. If only the derived RDF values were used by code, it might not be a problem, but in practice, code uses the unsecured JSON-LD values - hence the vulnerabilities. |
In the example where the If they were trying to depend on a credential of a certain type that expressed a holder's first name and middle name, it would not be a good idea to miss a check like this. Don't accept properties that aren't well- Approaches that work:
Communities developing and using new credential type specifications benefit from defining a good |
schema.org and google knowledge graph both use https://developers.google.com/knowledge-graph The problem is not JSON-LD keywords in contexts, the problem is insecure processing of attacker controlled data. If you want to secure RDF, or JSON-LD, it is better to sign bytes and use media types. You can sign and verify application/n-quads and application/ld+json, in ways that are faster and safer. W3C is responsible for making the web safer, more accessible and more sustainable. Data integrity proofs are less safe, harder to understand, and require more CPU cycles and memory to produce and consume. They also create a culture problem for RDF and JSON-LD by attaching a valuable property which many people care deeply about (semantic precision and shared global vocabularies), with a security approach that is known to be problematic, and difficult to execute safely. These flaws cannot be corrected, and they don't need to be, because better alternatives already exist. W3C, please consider not publishing this document as a technical recommendation. |
2024-05-08 MATTR Responsible Disclosure AnalysisOn May 8th 2024, MATTR provided a responsible security disclosure to the Editor's of the W3C Data Integrity specifications. A private discussion ensued, with this analysis of the disclosure provided shortly after the disclosure and a public release date agreed to (after everyone was done with the conferences they were attending through May and June). The original response, without modification, is being included below (so language that speaks to "VC Data Model" could be interpreted as "VC Data Integrity" as the original intent was to file this issue against the VC Data Model specification). The disclosure suggested two separate flaws in the Data Integrity specification:
The Editors of the W3C Data Integrity specification have performed an analysis of the responsible security disclosure and provide the following preliminary finding: Both attacks are fundamentally the same attack, and the attack only appears successful because the attack model provided by MATTR presumes that verifiers will allow fields to be read from documents that use unrecognized That said, given that a credential technology company such as MATTR has gone so far as to report this as a vulnerability, further explanatory text could be added to the VC Data Model specification that normatively state that all processors should limit processing to known and trusted context identifiers and values, such that developers do not make the same mistake of treating documents with differing The rest of this document contains a more detailed preliminary analysis of the responsible disclosure. We thank MATTR for the time and attention put into describing their concerns via a responsible security disclosure. The thorough explanation made analysis of the concerns a fairly straightforward process. If we have made a mistake in our analysis, we invite MATTR and others to identify the flaws in our analysis such that we may revise our findings. Detailed AnalysisA JSON-LD consumer cannot presume to understand the meaning of fields in a JSON-LD document that uses a context that the consumer does not understand. The cases presented suggest the consumer is determining the meaning of fields based on their natural language names, but this is not how JSON-LD works, rather each field is mapped to an unambiguous URL using the JSON-LD context. This context MUST be understood by the consumer; it cannot be ignored. A verifier of a Verifiable Credential MUST ensure that the context used matches an exact well-known
The former can be done by using JSON schema to require a specific JSON-LD shape and specific context values. This can be done prior to passing a document to a data integrity implementation. If contexts are provided by reference, a document loader can be used that resolves each one as "already dereferenced" by returning the content based on installed context values instead of retrieving them from the Web. Alternatively, well-known cryptographic hashes for each context can be used and compared against documents retrieved by the document loader over the Web. For this approach, all other JSON-LD documents MUST be rejected if they do not abide by these rules. See Type-Specific Credential Processing for more details on this: https://www.w3.org/TR/vc-data-model-2.0/#type-specific-credential-processing. This former approach is less powerful than using the JSON-LD Compaction API because it requires more domain-specific knowledge to profile down. However, it is still in support of decentralized extensibility through use of the JSON-LD Applying these rules to each case presented, for case 1: A verifier that does not use the JSON-LD API and does not recognize the context URL, A verifier that does not use the JSON-LD API and does recognize the context URL, A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context, and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended. For case 2: A verifier that does not use the JSON-LD API and does not recognize the attacker-provided context URL, A verifier that does not use the JSON-LD API and does recognize the attacker-provided context URL, A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context (and optionally, Note: While the disclosure suggests that the JSON-LD Comparison to JSON SchemaThe scenarios described are identical in processing systems such as JSON Schema where document identifiers are used to express that two documents are different. A JSON document with differing Original document{"$schema": "https://example.com/original-meaning",
"firstName": "John"} New or modified document{"$schema": "https://example.com/new-meaning",
"firstName": "John"} Any document processor whether utilizing JSON Schema processing or not would rightly treat these two documents as distinct values and would seek to understand their values equivalence (or lack of it) prior to processing their contents. Even consuming a document that is recognized as authentic would be problematic if the Original meaning/schema{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/original-meaning",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"description": "The name by which a person is generally called: 'given name'",
"type": "string"
}
}
} New meaning/schema{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/new-meaning",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"description": "The name spoken first in Japan: typically a surname",
"type": "string"
}
}
} Demonstration of Proper ImplementationThe attack demonstration code provided adds the unknown modified/malicious contexts to the application code's trusted document loader. A valid application should not do this and removing these lines will cause the attack demonstrations to no longer pass: documentLoader.addStatic("https://my-example-context.com/", modifiedContext) To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/93c0ba17b25e500d72c1ad131fe7e869 documentLoader.addStatic("https://my-malicious-modified-context.com/", modifiedContext) To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/4fb032c422b77085ba550708b3615efe ConclusionWhile the mitigation for the misimplementation identified above is fairly straightforward, the more concerning thing, given that MATTR is knowledgeable in this area, is that they put together software that resulted in this sort of implementation failure. It demonstrates a gap between the text in the specification and the care that needs to be taken when building software to verify Verifiable Credentials. Additional text to the specification is needed, but may not result in preventing this sort of misimplementation in the future. As a result, the VCWG should probably add normative implementation text that will test for this form of mis-implementation via the test suite, such as injecting malicious contexts into certain VCs to ensure that verifiers detect and reject general malicious context usage.
|
If you consider the contexts part of source code, then this sort of attack requires source code access or misconfiguration. Validation of the attacker controlled content prior to running the data integrity suite, might provide mitigation, but at further implementation complexity cost. Which increases the probability of misconfiguration. A better solution is to verify the content before performing any JSON-LD (or other application specific) processing. After verifying, schema checks or additional business validation can be performed as needed with assurance that the information the issuer intended to secure has been authenticated. At a high level, this is what you want:
Most data integrity suites I have seen do this instead:
The proposed mitigations highlight, that these security issues are the result of a fundamental disagreement regarding authentication and integrity of data. Adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc... Any application processing that occurs prior to verification is a design flaw, doubling down on a design flaw is not an effective mitigation strategy. |
we are speaking about this pseudo-code
which is loop and simple string comparison. I don't see a reason for any of the exploits you have listed here except an implementer's incompetence. Please can you elaborate how those exploits could be performed and provide a calculation, an estimation, how much this adds to complexity? Thank you! |
@filip26, setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent". Your specific pseduo-code is in-sufficient for at least the following reasons:
(edited to put backticks around |
@tplooker setting aside you are putting words in my mouth that I have not said which is quite rude and disrespectful ... add 1. you are wrong, by ensuring data is processed with a context you accept (the URLs) you know what is behind those URLs, and how much you trust those URLs, and perhaps you have a static copy of the contexts. If you follow untrusted URLs then it's an implemters fault. Use a browser analogy. |
I was browsing through past issues related to this. This specific issue was raised to suggest adding @tplooker given these new findings, would you revise your support since this was a bad recommendation introducing a security concern according to your disclosure? |
The URL for a context doesn't actually matter... In fact some document loaders will follow redirects when resolving contexts over a network (technically another misconfiguration). Depending on the claims you sign, you may only detect a mismatch in the signature, when you attempt to sign a document that actually uses the differing part of the context. Contexts are just like any other part of source code... Every single line of source code is a potential problem. You often don't control what 3rd parties will consider the bytes of a context to be... It's a feature, that's been turned into a defect by where it was placed. "It verified for me, must be a problem in your document loader." "I thought I would be able to fix it in only a few hours, but it took me 2 days and delayed our release" "I finally figured out how data integrity proofs work, thanks for letting me spend all week on them" I've paired with devs and shown them how to step through data integrity proofs, dumping intermediate hex values and comparing against a "known good implementation", only later to learn the implementation had a bug... Misconfiguration is common in complex systems. I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order. Authentication of json -> json web signatures The essence of a recommendation, is that you believe there isn't a better alternative. |
@OR13 I'm sorry but don't see it. You mention two issues: misconfiguration and bugs. Well, we have tests, certification, etc. Those issues are endemic to any software applications but we don't call all the software vulnerable because of just a possibility that there might be a bug but after we find a bug.
I would really like to see the complexity estimated. I guess we are seeing a very different picture.
Please let's be factual, what experts, what was recommended, etc. In EU when press article starts with a title "American scientists have ... " everyone stops reading it (they add the American to make it credible ;) |
@PatStLouis you raise an excellent point regarding default vocabularies. It's never too late to change what's in a context (joke). This working group cannot prevent anyone else from adding a context that includes a vocab. You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context. If json compatibility isn't a requirement, the working group can drop the vc-jose-cose spec and remove the vocab from the default context... This might even improve adoption of data integrity while clarifying that RDF is the claims format that W3C secures. I've argued this point previously. |
@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. (edited to put backticks around |
Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting Furthermore, if others in the WG knew about this issue, specifically that |
@tplooker wrote:
@tplooker wrote:
Please stop insinuating that people are acting in bad faith. Now might be a good time to remind everyone in this thread thread that W3C operates under a Code of Ethics and Professional Conduct that outlines unacceptable behaviour. Everyone engaging in this thread is expected to heed that advice in order to have a productive discussion that can bring this issue to a close. (edited to put backticks around |
From an implementer perspective maybe adding an example that "should fail" could be a good thing. Something like at #272 (comment) . As an implementation "case experience", I implemented in .NET something that produces a proof like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6 the university crendetial and then also verifies it. It felt a bit tedious to find out what to canonicalize, hash and and sign to get a similar result. The code is or less private code still, but now that https://github.com/dotnetrdf/dotnetrdf/releases/tag/v3.2.0 and the canonicalization is publicly released, I might make something more public too. I still feel I need to go through this thread with more thought so I completely understand the issue at hand. |
@veikkoeeva wrote:
Yes, that is already the plan for the test suite in order to make sure that no conformant implementations can get through without ensuring that they refuse to generate a proof for something that drops terms, and/or, depending on the outcome of this thread, use That's a fairly easy thing that this WG could do to ensure that this sort of implementation mistake isn't made by implementers. Again, we'll need to see how this thread resolves to see what actions we can take with spec language and test suites to further clarify the protections that we expect implementations to perform by default. |
I'm not suggesting a change, my goal is to understand why this recommendation was suggested in the first place and removing it is now listed as a remediation step to a security concern raised from the very same parties who suggested it.
Correct, Many protocols have features that can be unsecured depending how you use them. This doesn't make the protocol inherently flawed.
Apologies if you misunderstood my statement, but my intention was not to report an architectural flaw.
Yes The Data Integrity spec provides hashes for their entries that verifiers can leverage while caching the content. AFAIK this is already a thing.
Thank you for pointing out to these issues, I enjoy looking back at historical data from before my time in the space. As pointed earlier, some of the parties that made that recommendation are now recommending removing it as a remediation to a security concerned that they raised. The use cases listed for this recommendation was for development purposes as described in #953. Furthermore, the private claims section of the jwt RFC reads as follows:
Enabling this by default does not sound like a good recommendation to me. It's easy to setup a context file, it takes 5 minutes and a github account. If you are doing development, you can just include an Regardless this was already discussed by the group and the decision has been made. The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol. For concluding I find this disclosure valuable as I got to learn a bit more about json-ld and gives a great resource to demonstrate implementers how to properly conduct verification of credentials + issuers how to properly design a VC. (edited to put backticks around |
I would actually classify those attacks as "Data Integrity Signature Wrapping" (DISW) attacks. They share many similarities with XML Signature Wrapping Attacks (XSW) that occurred in the past. Also, note that it is possible to use XML Signatures securely if appropriate mitigations are implemented correctly. The same holds true for DI. The question is where we would add requirements for those additional mitigations for Data Integrity Proofs (DI). The VCDM uses |
It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the Example: VC using a base data model for all driving licenses
Example: VC issued by DMV of Foo
Example: VC issued by DMV of Bar
When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar. Verifiers in the realm of Foo are primarily interested in the base properties of the Adopting the |
Great examples! Thanks! Some context on why I think why a test "should not happen" plus a less mentioned issue of having good examples. Related to #272 (comment): I'm not completely alien to this sort of work and indeed, when I implemented the "first pass sketch" of the code, I bit struggled with implications of this sort since I'm not so familiar with JSON-LD. So, I thought to "get back to with better time" and just not release anything before things are clearer (plus the library change not being public, though there's something already in the tests about this). Some part of that was if I have a document like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6, how to pick apart the pieces for canonicalization, turning into bytes, hashing, signing and so on. For this "sketch" I was quite happy to have the same results with the keys as the example document, but I know I paid only passing thought for these sort of things. Partially because there's been related discussion earlier. I mention this about this example piece, since I think since good examples are perhaps more important than has been implied here. I naturally also think that a test of what not should happen are important -- and maybe add some notes of the sort to an example or two too. They're already something I've (we) been codifying to some tests. It's also a great way to document things. |
@awoie a verifier should not guess what's inside a context nor to try to anticipate if there is some agreement between context providers.
If a verfier recognizes both An ability to understand well know terms, e.g. defined by schema.org is a great feature but not in VCs eco-system where we don't want to guess but be sure.
It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context. If there is a need, a good reason, to share parts between some parties, then the easiest, transparent, and scalable solution is this:
|
@filip26 wrote:
I didn't say it is not a solution. My point was that it is a solution which does not scale. A verifier from Foo might have never seen a @filip26 wrote:
No, it doesn't because the assumption of the |
It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know - that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issuers, verifiers, etc. |
@filip26 wrote:
My point on scalability refers to an increase in operational costs, not necessarily performance. Performance might be another point but I cannot comment on that. @filip26 wrote:
If this is what we want, this sacrifices the open-world data model the VCDM promise as mentioned here. |
@awoie I'm sorry, I don't think we are on the same page and I'll let others to explain that it does not affect scalability of VCs eco-system nor open-world data model. |
@awoie My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system? Verifiers will know what information they want to verify, they are not blindly verifying abstract data. As for the classification of this disclosure, while I can't really argue with your labeling, this is not a formal classification. If we take 2 examples of vulnerability disclosed around XML Signature Wrapping Attacks: Both of these affect a specific software and lead to 2 distinct CWE:
They are not addressed by a change to XML, but a security mitigation in the affected software. This is an important distinction to make and loops back to a Security Misconfiguration. It's hard for me to understand what exactly this disclosure tries to underline as the vulnerability
In seems the target of the vulnerability is being shifted around depending on the questions asked/comments made. |
All — Please review your comments above, and edit them to codefence all instances of All you need to do is wrap those Failure to do this has looped several GitHub users (e.g., This is often shrugged off, as, what's the big deal about a few irrelevant notifications? Well, in this thread, it's more like 133 (and counting) irrelevant notifications. In most spheres where I travel, that would be classed as spam and/or harassment. Please, let's be good gitizens. |
The issue was discussed in a meeting on 2024-07-10
View the transcript2.1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)See github issue vc-data-integrity#272.
Brent Zundel: concerns about security implications of data integrity signatures.
Brent Zundel: pending a P.R to address the issue. Gabe Cohen: spent a long time trying to understand this issue.
Gabe Cohen: The outcome of the proposal would be 2 or 3 issues to address these points and associated PRs. Manu Sporny: Agree that 0,1,3,4 parts of the proposal are good ideas.
Manu Sporny: 0 is easy, 1 will take some work.
Manu Sporny: will come back to 2, that one is controversial.
Manu Sporny: Telling issuers and verifiers to implement business logic to pass the test suites.
Brent Zundel: For number 2, the integrity of contexts. Is it sufficient to recommend people to use the related resource property. Joe Andrieu: When talking about context integrity, we are talking about some hash. Not that the context is signed over.
Manu Sporny: I think they are the same. We aren't talking about signing over the property and the value. We are talking about including the hash of every context in the Joe Andrieu: We do not currently sign over the Anil John: Is option 2 not a secure meta data distribution problem. Dave Longley: Let someone else speak to aniltj.
Dave Longley: even if contexts never changed and were integrity protected, you can still make these mistakes if you don't know the context. Gabe Cohen: responding to aniltj, yes secure meta data is part of it. David Chadwick: commenting on the idea of using the related resource property, that is signed over. Noting to stop issuer adding relatedResource property to any URI.
David Chadwick: Cannot stop an issuer from using this. Brent Zundel: question is do we want to encourage people to use this as a means of addressing this issue.
Greg Bernstein: part of this issue from a security point of view comes down to understanding who controls what inputs.
Greg Bernstein: or do we accept that the context is under the control of the verifier. E.g. the verifier idicates the context values they accepts.
Manu Sporny: brent said could we just use relatedResource. We could, but how normative would we want to get with that.
Manu Sporny: Things to consider. This can be viewed as a meta data distribution problem.
Manu Sporny: never have to go to the internet. You should not be doing that. Should have a local copy. Brent Zundel: Need to start transition to what the next steps are. Joe Andrieu: conflating issues between the integrity of the property. Not convinced data integrity does not secure the value of the context.
Anil John: not a tech provider or tool user. I am an user of this technology.
Anil John: Struggling to reconcile what is important and relevant to what we are trying to do.
Ted Thibodeau Jr.: First, data integrity signs over quads which is the result of applying the context to the json being signed over.
Ted Thibodeau Jr.: This is open world, VCs can go anywhere at any time, I've never verified a VC from issuer X... I need to get context file that's relevant for their VCs -- this is not the /default/ context, this is the Verifier X context, in addition -- I have to retrieve it and make use of it and have assurance that context file that I retrieve is the same as the context file that was used when generating the VC. Dave Longley: to respond to TallTed, I think it is the question around secure meta data delivery.
Dave Longley: These quads are secured. Manu Sporny: happy to raise PRs for 0,1,3,4. Everything except context integrity protection. Ivan Herman: related to manu around the PRs to come. Gabe Cohen: question about the use of JCS.
Gabe Cohen: What about nested contexts. Contexts that reference other contexts. How do we secure those.
Dmitri Zagidulin: Main question is doesn't the relatedResource mechanism address a lot of these concerns.
Brent Zundel: relatedResource exists inside of the data model. Would work only when data integrity is signing VC data models that include related resource.
Manu Sporny: ivan asked if we should add language around providing hashes of finalized contexts.
Manu Sporny: We need to think about the use cases deeply before saying MUST around providing context hashes.
Manu Sporny: A known bad context was included in the verifier as a good context. The end.
Manu Sporny: We can clean it up and clarify.
Joe Andrieu: I don't agree, I think today people defining contexts can create a URL for a context that has a hash in it.
Brent Zundel: we have a path forward. Thanks for that. look forward to seeing those PRs.
|
As an implementer of wallets, verifiers and issuers, and both DI and JWS libraries, I want to add my thoughts to this crucial discussion.
|
FWIW, the issue of adding SRI to a Cc @BigBlueHat |
@dmitrizagidulin could you clarify what you mean by "hash protected" here and how this protection would work? For example do you mean protected by the issuers signature like what has been suggested by @decentralgabe, myself and others? Or instead leaving the protection reliant on an out of band step similar to the current specification guidance and hence optional in practise when it comes to validating a DI proof. |
At present, I think we should go with (and recommend to devs) the 'digest hashes signed over by issuer' like you and Gabe suggested. Specifically, the Would I prefer adding the digest hashes to the URLs, inline? (So that links would look like this: {
"@context": [ "https://www.w3.org/ns/credentials/v2#digest=12345"],
// ...
"evidence": [{
"id": "https://example.com/files/my-dissertation.pdf#digest=4567"
}]
} Yes, I'd prefer that mechanism, in the ideal world (if we had time before CR to publish it as a standalone spec, AND if there was political will in the community to add support for this to JSON-LD libraries (or at least on the HTTP client level that those libraries use) in various programming languages). It's a somewhat better mechanism because a) It could be used in embedded contexts (where you add a link to another context inside a context, like examples/v1 did). and b) It's a useful general purpose JSON mechanism. BUT, given the current landscape, it's more realistic to use |
@iherman Oh, that's really cool! Thanks for the link. I really hope the future json-ld WG does get around to standardizing that mechanism! |
I think I like what Dmitri's suggesting, but to clarify:
Having to use |
Noting actions to raise PRs and issues taken on the last telecon.
PRs have been raised for this item: w3c/vc-data-model#1520, w3c/vc-data-model#1524, and w3c/vc-data-model#1524
Now being tracked in w3c/vc-data-model#1529 A PR has been raised for this item: w3c/vc-data-model#1535
Two PRs have been raised for this item: w3c/vc-data-model#1537 w3c/vc-data-model#1567
Now being tracked in #288 A PR has been raised for this item: #291
Now being tracked in w3c/vc-data-model-2.0-test-suite#88 and w3c/vc-data-model-2.0-test-suite#88 with the expectation of PRs for discussion within a few weeks. |
@dlongley The bit about display which you picked up on is peripheral to the main text. We can easily omit it so that the proposed text becomes "DI signature verifiers MUST trust all the |
The issue was discussed in a meeting on 2024-07-31
View the transcript2.1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)See github issue vc-data-integrity#272. Brent Zundel: Data Integrity: we have one aspect of one key issue left.
Manu Sporny: 272 has 143 comments now, been discussed extensively, Gabe made a proposal many weeks ago that had broad support for a number of the items, can follow your nose to what Gabe suggested and how those things went into PRs.
Manu Sporny: all doing this tells you is that you are using the same content hash that the issuer used when they created it, this is insufficient to protect against the kind of attack in 272.
Brent Zundel: would it be possible to add a recommendation that folks do resource integrity? would that be a means of satisfying the folks who have raised this issue?
David Chadwick: my suggestion would be weaker, just to add a note saying that if the issuer wishes they can integrity protect the context with related resource.
Gabe Cohen: +1 to DavidC's proposal. Brent Zundel: would anyone be opposed to adding this note? anyone want to argue that the context integrity stuff absolutely must happen? Manu Sporny: I can raise this PR. Brent Zundel: we raise the PR, we go into the issue and list the PRs that have been raised, we as a working group believe this addresses the issue, then close the issue - is that right? Manu Sporny: sounds good, one thing we can always do is strengthen requirements, if you are working in an ecosystem that needs this, in their spec they can mandate that verifiers must check this, at the base layer we are saying that you can use this feature, but in another spec using this you can make it a must.
Manu Sporny: it'. |
The issue was discussed in a meeting on 2024-09-05
View the transcript1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)See github issue vc-data-integrity#272. Brent Zundel: Welcome everyone to the special topic call - topic today is Data Integrity issue 272. Tobias Looker: Thanks for holding the special topic call. I won't spend too much time going back through the issue... 144 comments, I think we're all well versed into conversation that has gone in attempt to resolve the issue. Brent Zundel: I'm happy to presume that everyone is up to speed on this discussion.
Dave Longley: I think SHOULD is strong enough, we have a number of use cases and things we've discussed on that issue. Reasons to allow different contexts to be used that are different from issuer wants to use. IWe've talked about design around flexibility on those cases. Individual uses on the spec in different areas, different verifiers, can profile the spec and require related resource to be used if it suits that ecosystem. At base layer of spec, we should have ability of other uses to work. Brent Zundel: An assertion was made by Tobias that, specifically, the set of use cases that would require additional context switching would still be possible and should be done after issuer context as provided has been part of verification process. Can you talk about those lines? Dave Longley: This doesn't actually solve the problem, the recipient has to understand what context is, even if it is from a context that is trusted, terms definitions, doesn't fundamentally solve the problem. Tobias Looker: What I was trying to say is -- from a selective disclosure perspective, it still works, but some of the context values don't expand to terms. I don't think SD is prohibited. The same with CBOR-LD. One of the overarching points is issuers fundamentally comfortable w/ data transformations after data is signed that might misconstrue or mis-convey information that was originally signed. Most document issuers of major credentials don't want that level of manipulation to occur. They don't want holders to manipulate credentials to misconstrue credential, come to rely on information. It's issuers reputation on the line at end of day, just like w/ physical documents today, we don't let holders allow changes. It's issuers reputation on the line, they should have a say on how much document can be modified after issuance. Leaving document to be manipulated leads to issues. Tradeoff is difficult for issuers to be able to adopt the technology if it is allowed to persist. Dave Longley: That we're having a philosophical debate on extend of three party model and who gets to have control indicates that this is what profiles are for. Spec has flexibility, that is good, I don't think we should be too prescriptive. We should allow separate ecosytems, let the ecosystems have the choices they'd like to make. Anil John: Speaking on behalf of a high value credential issuer, resonate with some of the things Tobias noted, issuers have a perspective on credentials that we issue. I also think that there has to be a balance. We as an issuer are interested in providing ability to verify information that we provide and also allow credentials to be used via CBOR-LD, rendering mechanisms, and the like. We as an issuer won't be silent on it, spec as it is written, if we feel strongly about certain aspects of it, we can profile the standard itself to make sure other credentials we issue follow a path that conforms to our belief.
Brent Zundel: Short answer is no, you're not prohibited from doing that. Tobias Looker: That is a good point, the text today is optional.
Tobias Looker: If I'm an issuer, and VCs today under this assumption, these Manu Sporny: to respond to one comment.
Dmitri Zagidulin: The conversation on the call so far, Anil mentioned that while requiring context verification might be too far, he'd like to see ability for issuer to say "verifiers MUST do this". At the moment, relatedResource - we don't have normative language if related resource is present, they must be enforced, if they fail, the VC verification fails. Adding the normative language, making relatedResource normative and ensuring issuers enforce context integrity. Brent Zundel: I got on the queue to suggest something along similar lines. We have two conformance classes, conforming producer and conforming consumer. If I'm understanding use cases property, we do not want to prohibit the producer to do things that require context integrity protection, what if we do what Dmitri just said -- it is a MAY to include context integrity protection using Anil John: I am curious to hear answer to your question. I wanted to correct one thing that Dmitri said -- verification and issuance side, my point is that I, as an issuer, can control my issuance infrastructure. Profiling allows me to say whether or not to use
Anil John: Ultimately, verifiers don't have to follow our guidance. They might have a different risk tolerance. We issue multifactor smartcards, 2nd factor pins, but there are many cases where people look at photo and verify as valid.
Michael Jones: What puzzles me is that we're talking about this as if it's a garden variety optional feature that people can do it or not, given that it's a security feature, and it will be abused ... should we even give people that option. Tobias Looker: A similar point, we have to see that this is not an integrity protected portion of the VC, at best, not like unprotected headers, effects entire part of processing of VC. It effects everything that's not integrity protected. Dave Longley: I largely don't agree with the analogy, a lot of this is coming from , we're talking about even if you do it and don't understand form and terms, your application is invalid. It fundamentally does not matter, if you receive a context you don't understand, you need to reject.
Dave Longley: All of that being said, I don't think I'd have a problem saying that if Brent Zundel: Seems like proposal on the table is that it will not be required, MAY or SHOULD, some issuer MAY use context integrity protection, however, if that has been done, verifier MUST fail verification if that fails. Manu Sporny: I'm trying to think through that as applied to selective disclosure and the BBS use cases--unlinkability.
Manu Sporny: there's a great deal of new complications if we do this.
Manu Sporny: JOSE/COSE doesn't provide those protections.
Brent Zundel: I'm all for requiring people to do stuff.
Manu Sporny: yeah...funny, but that's not what I was saying.
Michael Jones: I will object to the proposal because it fails to secure the context, if you don't secure it, context substitution doesn't occur, it needs to be compulsory on both sides. Knowing contexts, doesn't prevent attack, it can happen w/ other contexts that you can accept.
Manu Sporny: That's not how it works selfissued.
Joe Andrieu: I agree with Manu, I dont' think a successful attack has been proven. I'm not understanding where this attack shows up.
Joe Andrieu: I think we're also conflating verification and validation.
Joe Andrieu: People are talking about securing context, and string and context is secured. Historical notion might be what is being confused, context that is returned as a resource should be different, but should be different any time... context doesn't have hash to do it, trying to cobble one after the fact might be useful, but doesn't have credibility for verification. Dmitri Zagidulin: At this point, the
Tobias Looker: I'll try and reply a practical attack when you follow all the guidance, malicious party, RP validates a government identity credential and I know schema and ontology, corrupt RP in some way to manipulate credential presentations. Problem is known context values are not issuer specific, implementations have one document load, contexts I trust, not related to issuers, so if I can get a context value trusted for a different VC that somehow changes or substitutes ontology I can use trusted context to. Manu Sporny: -1 nope, signature will fail.
Tobias Looker: These could be conflicting, general purpose bucket of context values, as RP, if I know what context values you trust, I might be able to manipulate those in a way that allows me to do the attack. Manu Sporny: tplooker that is not how a secure implementation is going to work.
Joe Andrieu: At any point we have to dereference, we're creating a security concern... relatedResources, don't dereference if you need to... we can't mandate it.
Dmitri Zagidulin: At the moment we don't have capability to fulfill Anil's requirement, issuer flag what resources must be secured. My proposal is true/false to related resource for signal from issuer.
Brent Zundel: Thank you, this was a great conversation, appreciate the civility, consideration of different viewpoints. This conversation will go into the issue. We continue to explore the possibilities that are on the table. Hpefully we can move this forward in a way that is amenable to folks and find some consensus. |
The issue was discussed in a meeting on 2024-09-26 List of resolutions:
View the transcript3.2. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272)See github issue vc-data-integrity#272. Manu Sporny: Currently there are 145 comments regarding security vulnerabilities.
Manu Sporny: We have made adjustments to the core specification to Brent Zundel: You said something during the break, Manu, that the changes that have already been made mean that when an issuer proves a context, the verifier needs to already be aware of the context prior to receiving the content. Kevin Dean: An immediate concern with verifier receiving content... there may be contexts that are extensions of those that are already known. I could receive something that builds on top of existing context, relies on underlying context when I'm processing. Dave Longley: A context URL is either well-known or opaque. A well-known URL will appear as a constant in source code. You've written your application against the terms defined in the context. Otherwise, you have an opaque URL. Your application cannot process against that as it's not known to the application. Manu Sporny: An attack has not been demonstrated. There's no code that demonstrates an attack that the specification defines for context management. Dmitri Zagidulin: I don't take issue with there being no attack demonstrated. It would not go amiss if we required cryptographic verification by the verifier of the terms in a related context. We have this resource, we have cryptographic protection, so why not make it required by the verifier even if there is no demonstrated attack? Dave Longley: If an opaque URL shows up, you must transform that document to something you can understand before you can consume it. On top of this, the cryptosuites protect the document, so verification must take place first. Only after that does the context define how the document must be processed.
Dmitri Zagidulin: Adding that requirement to verifiers has additional benefits. It prevents developer confusion if the context changed. I agree that hash checking isn't required but adds value to the verification.
Brent Zundel: The solution we are discussing and that has been discussed in the past is the proposal that if an issuer elects to protect the integrity of the context through the related resource property, we should add a requirement that a verifier must verify the integrity of the context. Benjamin Young: I think we need to separate the "why" because this is not enhancing the security and is adding processing cost. Manu Sporny: +1 to what Benjamin said. As long as it's optional for issuers for contexts, that seems like it would be acceptable to verifiers. Ivan Herman: The only thing I could add to that is that I think we should do it, and we should draw attention to the fact that issuers should treat this feature with care because of the high cost to verifiers. Dave Longley: Something that would make a lot of sense to me to propose that if your application reads the related resource field, it must check the hash.
Benjamin Young: It's possible now. You can do this now. Your implementer can use related resource. We added language saying that you can do that. Brent Zundel: The part of it that was clarified for me is that, if you are planning to dereference the URL, you MUST check it. I think that's a very sane thing to add to the spec.
Michael Jones: As I see it, related resource is a security feature. It makes no sense, if it is present, to not require it as a security feature. Brent Zundel: I think we're at the point where we can entertain language for the proposal. Manu Sporny: Proposed language. Brent Zundel: Proposal is to add this language to the specification to address issue 272, raise as PR and close 272. Michael Jones: Not OK with the beginning part. I think the way it should be worded is that, if something is listed in the related resources list, its integrity must be checked. Brent Zundel: What if it says, "If it makes use of a resource..."? Manu Sporny: e.g., What happens if I like to a 5GB video? I don't need to check it unless I actually access it.
Manu Sporny: I'm concerned about VCs that link to lots of other documents. That could create a DoS attack on verifiers. Dmitri Zagidulin: Agreed with Manu. Another nuance is that the other reason to not say that if it's there it must be verified. If it's cached, it should not need to verify it again as it already has a local copy. Michael Jones: I get the 5GB video example. That's not the motivating case. What we're talking about are the contexts that are going to be a few kB. To partially address the issue, we should say that if the related resource is a context issue, it must be checked. Benjamin Young: We could use language like "activate". For example, the subresource spec for HTML doesn't process them until they're used. Kevin Dean: Back to the question of there being multiple context files that are not known to the verifier.
Kevin Dean: It's not that you're going to process them because you don't understand them, those context files are irrelevant for the verifier, no need to download them. I don't see any issue with the idea that if you're not going to use it, you're not going to check it. Gabe Cohen: I'm struggling to understand the use case where it's there but not going to be used.
Michael Jones: I'm fine with if you're verifying something you already have cached you don't need to verify again. If something is listed in a context, are you free to not use the context? I thought that there are context things that have non-local effects so you have to look at all contexts. Benjamin Young: To the question about JSON-LD, in broader JSON-LD land, there are moments and times where you need to get all context files, cached or not.
Michael Jones: I have tried to bring all of this to Tobias to weigh in again. Brent Zundel: I feel we are on the cusp of a decision.
Denken Chen: I would like to propose that libraries be updated to check all the information in the VC. We are at the early stage of VC adoption. If we don't enforce that in standard libraries... I propose discussing the boundary of safety boundary. To me, vc standard should be self-consistent in terms of chekcimg integrity of all information within the vc itself. That’s a great security improvement. Kevin Dean: In answer to question about why have resources you don't reference... gernreric credential could have image, you're not interested in image, jut want name and address.
Gabe Cohen: My question is, should we support issuing credentials that some verifiers may not be able to verify? Should we allow verifiers to not verify related resources? I will not block resolution. Brent Zundel: Current proposal is as above.
Brent Zundel: Other language includes "if the verifier makes use of the resource...".
Benjamin Young: Revision proposed.
Brent Zundel: If there's language that would change your -1 vote, let us know.
Manu Sporny: I will raise a PR for this. Brent Zundel: It's lunchtime. Manu Sporny: There are three proposals related to data integrity to look at tomorrow.
|
Based on the discussion at W3C TPAC, the final PR to address this issue has been raised here: w3c/vc-data-model#1567 This issue will be closed once that PR has been merged. |
I was unable to attend the TPAC call on this topic, however for the record I don't believe #1567 solves the issue, because it keeps this critical security feature as optional for an issuer to implement. It is better then having nothing at all, but it is not a proper fix. |
@tplooker wrote:
All of the PRs related to specifications that this group is working on listed in #272 (comment) have been merged. Per the discussion at W3C TPAC, as well as during the last WG call, the WG has applied every change request related to this issue that it has been able to achieve consensus on and does not believe there are any further PRs, based on the exhaustive discussion in this issue and during WG meetings, that would achieve consensus. Closing. |
The following issue outlines two significant security vulnerabilities in data integrity.
For convenience in reviewing the below content here is a google slides version outlining the same information.
At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.
In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.
The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the
@protected
keyword doesn't cover terms that are defined using the@vocab
and@base
keywords. This means any terms defined using@vocab
and@base
are vulnerable to term redefinition.The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the
@context
element of the JSON-LD document. The fact that the@context
element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.Please see the attached presentation for resolutions to this issue we have explored.
In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the
@context
element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.I'm also happy to present this analysis to the WG if required.
(edited to put backticks around
@words
)The text was updated successfully, but these errors were encountered: