-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up herd privacy issues #539
Comments
I agree with @jandrieu analysis and suggestions but there's an even bigger risk than herd privacy if DID Core allows things that could be done in DID methods. Governments, employers, and manufacturers have the power to decide which DID methods they will accept. They are the sovereigns by virtue of their asymmetric power in most cases. We are familiar with US services that ask for your social security number or with Indian Aadhaar identifiers being required for access to non-government services. When we weaken the privacy protections in DID core we are guaranteeing that methods will use that feature to serve the business interests associated with those methods. I believe we are already seeing this as unspoken substrate to this issue. If we allow this to happen, it threatens the very foundation of our work on self-sovereign identity and it risks wasting the huge investment we have all made in this work. DID core must not tacitly encourage privacy compromises. The differentiating factors among DID methods must be highly visible and clearly linked to the DID method. Consider, for example, the way renters are protected by standardized rental agreements in most large cities. Nothing prevents the landlord from amending the standard rental contract but those amendments stand out when they are added in longhand and that triggers discussion. Loan agreements have similar protections that require certain standard features like the effective interest rate to be in unmistakably bold type. I think of DID core as the equivalent to that standard rental agreement. For consumer protection and the very principles of self-sovereignty that we're working towards, every deviation from privacy by default (as nicely described by @jandrieu above and with respect to various issues) must stand out as prominently as we can make it. If we don't our good work could turn out to be overtaken by unintended consequences. |
Speaking up on Herd PrivacyI've been meaning for some months to speak up and express my serious concerns with the approaching CR (Candidate Recommendation) version of the DID specification due to issues of lack of focus on some privacy security fundamentals, of which this issue of herd privacy is only part of the story. I believe supporting herd privacy is critical for DIDs. Unfortunately, it is now endangered by features added to DID Core that make DID Documents slightly different, and thus more correlatable. Instead, I need to reiterate my strong support for herd privacy by placing features of this type into DID Methods or in "off-DID" approaches leveraging Verifiable Credentials and/or zcaps, other protocols, and not have them as part of DID Core. By doing so we can ensure that DIDs can serve vulnerable populations for whom that privacy can literally be a matter of life and death. The Definition of Herd PrivacyDefining herd privacy is simple. It can be found in section 10.5 of the specification, which says that "When a DID subject is indistinguishable from others in the herd, privacy is available. When the act of engaging privately with another party is by itself a recognizable flag, privacy is greatly diminished." That same section further states the importance of herd privacy, stating: "DIDs and DID methods need to work to improve herd privacy, particularly for those who legitimately need it most. Choose technologies and human interfaces that default to preserving anonymity and pseudonymity." This is not a new element of the specification. It was part of the DID Implementers Draft 0.1, which I co-authored at Rebooting the Web of Trust 3, in late 2016, which was effectively the first public version of the DID specification. The Importance of Herd PrivacyI believe that herd privacy continues to be a critical element of the Core specification that we need to respect accordingly. This is not a philosophical objection. Although DIDs will serve many purposes, some commercial and some not, one of the defining commitments that came out of our work with ID2020 at RWOT in early 2016 that led to DIDs was the need to serve vulnerable populations. As I said, privacy can be a matter of life and death to them, and herd privacy is how the DID spec achieves that. These might be people in China who are in a disagreement with the government; they might be whistleblowers in the United States who are speaking out against abuses of the government or a corporation; or they might be citizens in Africa who could be vulnerable to warlords or other extra-legal forces. DIDs give them a way to authenticate themselves with verifiable credentials, allowing them to prove who or what they are for various networked activities. But if they are not protected with a cloak of privacy, they will either be forced out of the electronic networking of the twenty-first century or else endangered by becoming a part of it. Herd Privacy isn't only W3C requirement, as IETF RFC 6973 - Privacy Considerations states:
My History with Herd PrivacyThis also isn't a new topic for me. I've actively advocated for herd privacy in the past and seen the success of including it in a major internet spec. One of my main concerns when co-authoring the SSL/TLS standard was to ensure that all traffic was indistinguishable: that "herd privacy" could be achieved if sufficient numbers of people used it correctly. Other early competing protocols to SSL/TLS were specific to the web or didn't protect metadata, but our SSL/TLS architecture was agnostic. We were charted to focus on the web, I instead focused on an architecture to secure all transports (which sometimes got me in trouble). This architecture helped make it the world's most widely deployed security standard. One of my proudest moments (and the inspiration for our first Rebooting Web of Trust meeting) was when I heard in 2015 that more than 50% of email to and from Google was being secured by SSL/TLS. The architecture worked! It also showed that herd privacy wasn't solely a question of serving a vulnerable population, but also something that could lead to the overall success of a specification. Unfortunately, SSL/TLS has also shown us how privacy protections of this sort can be eroded over time. Across two decades of practical use, a variety of identifier and correlation attacks, as well as architectural challenges such as the dependence on DNS and certificates, have made the herd privacy of SSL/TLS less powerful. We will face those same challenges as DID grows in usage and popularity in the decades to come. We thus need to ensure that our initial release of the specification serves the needs of herd privacy to the greatest degree possible, without introducing susceptibilities such as identifiable DID Documents. These cracks in our privacy model would grow over time. Current Issues with Herd PrivacyI believe that the privacy vulnerabilities now being considered as part of the upcoming CR are the result of us becoming too focused on the financial opportunities of LESS (Legally Enabled Self-Sovereign) Identity, and not spending enough time working on the trust-minimized version, which we describe in our own spec as the support for "those who legitimately need it most". We need to stop being a big tent for all purposes and instead first focus on fit-for-purpose, in particular for oppressed groups, who were one of the main communities for whom we created DIDs in the first place. Making features in the core DID spec "optional" is not sufficient: we "SHOULD" offer herd privacy and better defaults at the DID Core level for version 1.0. I might even argue for "MUST". I know that normative language was pulled from the 2016's DID Implementers' Draft 0.1 since we can't "prove" conformance, but I do still believe that it is a requirement. Alternate Solutions for Those Other NeedsThis is not to say that we can't serve those financial opportunities of LESS. We certainly want DIDs to become commercially successful to ensure widespread adoption, just like my work with SSL/TLS. However, we already addressed that in our preliminary work on DIDs way back at Rebooting Web of Trust 2, when we first linked up with the ID2020 community. That's when we created the compromise that split DIDs up between DID Core and DID Methods. DiD Core should be conservative, especially in regard to potentially existential dangers such as impairing herd privacy. A trust-minimized version as a minimal architectural specification fits in with that conservative view. Innovation in DIDs is welcome, but I believe it must occur at the DID Method level. My Final ThoughtsI have demonstrated my committed toward the completion of the DID standard for which I'm credited as a co-author. I am the founder of Rebooting the Web of Trust, where DIDs were first incubated and were iterated through its first primordial requirements and specification. I was co-chair of the W3C Credentials CG for 4 years where the DID spec continued to be incubated toward a draft that could become the basis for a W3C Working Group. Now I continue to contribute as an invited expert to both the W3C DID and VC Working Groups. My views on the formal specification of DIDs are colored by my co-authorship of the TLS specification, one of the most successful and widely adopted security protocols on the internet, and more recently by Bitcoin, one of the most secure modern cryptographic protocols. Unfortunately, my experience — based on the original goals of DIDs, the needs of a new internet standard, and the evolutionary growth that occurs following adoption — suggests that some of the current issues with DIDs could invalidate all of our hard work. We must reemphasize the need for herd privacy, and privacy in general, in the CR of our DID specification, and we must do so by removing elements that endanger it, such as correlatable parts of DID Documents. This can be done by minimizing the elements in the DID Core spec itself, with the expectation that they can be accomodated in DID Methods or other protocols. I am available for discussing this topic with others, in order to support a more minimal core spec. -- Christopher Allen, Principal Architect & Executive Director, Blockchain Commons |
Looking at the three comments above (from @jandrieu, @ChristopherA, and @agropper) and I find all these arguments compelling. But I am not a privacy expert, so I am looking for something more tangible. What are the features that should not be in the specification? Can we have some list of features that we should remove either from the normative part of the specification or from the specification altogether? This is the only way to move forward at this stage. If I we were in the early stage of design, I would opt to remove the concept of DID URLs altogether. I feel that it would resolve most of the problems listed in this issue, and the only thing we would have to formally define instead is how to identify a specific verification method within a DID Document to be able to cross-reference it. (That is the only internal reliance on DID URLs I can see). But I realize that this is a nuclear option, and it is probably not viable at this point: too many implementations and uses of DID URLs out there already. I just raise it to motivate the creation of a more specific list of features to be removed in order to move ahead... |
@iherman, the privacy issue is not DID dereferencing. It's the expectation that a DID document has added stuff that I cannot remove because the verifiers expect it to be there as a convenience to them. The "papers please" problem arises when we combine the features that enable control and verification of a DID Document with features that enable linkage of the identifier with attributes of a subject. We wisely decided that the linkage of attributes to an identifier would be standardized as a VC. We realized the importance of separating the bundles of attributes so that an identity we call a 'holder' could choose which attributes to present in which circumstance. Every single SSI wallet I have ever seen stresses this ability for the holder to choose which attributes about themselves to present in which context. If DID-core specifies the introduction of one or more subject attributes in the DID Document we are turning that document into a certificate (of identity) that does not have an obvious means of control over which attributes are presented when asked for "papers please". Let's say I use DIDs to sign-in to Twitter by proving to Twitter control over some key material. At the border with Elbonia, I am asked to unlock my Twitter account for inspection. I have taken the trouble to keep two separate Twitter identities under two separate DIDs. Or, maybe Twitter has taken the trouble to allow me to choose which DMs are displayed to the border guard or not depending on which of two DIDs I hand over to the Elbonian verifier. Before looking through my Twitter persona, the verifier looks through the DID document I use to control that persona. How does the verifier know that I am the subject of that Twitter account? Is there anything in the DID document that links me to the contents of the Twitter account other than the DID itself? Maybe yes, maybe no. Whether or not the DID document contains some such attributes depends on the DID method. Who decided what DID method would be associated with a Twitter account? Does Twitter say I must not use did:key or did:peer because that have a Real Name policy? Or do I decide which DID method to use to open a Twitter account and then, if I choose, I post that DID in the account itself along with some selfies for all to see? I don't see DID dereferencing as the real issue. if the Elbonian border guard can scan a DID I present and see my Twitter feed I can still have two Twitter accounts. I can't prevent any sovereign from saying that all Twitter accounts must be "verified" or otherwise attached to a real name. They do that by specifying the acceptable DID method. But I can make it obvious that they are forcing me to use a verified DID in which case I would self-censor what I post on Twitter and take my real communications to somewhere else. If the method has something in the DID document that I want to remove, how do I do that? The privacy issue is not DID dereferencing. It's the expectation that a DID document has added stuff that I cannot remove because the verifiers expect it to be there as a convenience to them. And now, that DID document is no longer under my control. |
@agropper, forgive me, but I try to be very much down-to-Earth here, with an eye on our plan to publish a CR soon…
You use conditionals here. Is there any specific DID core term that is allowed on a DID Document that violates this? Do we have to add some specific extra constraints to the specification to avoid these issues? Because if the answer to both of these questions is 'no', then I am not sure what we are discussing here.
Right. And the DID Core specification does not say too much about the DID method in this respect. Should we formulate a more restrictive view on methods in the Core specification? Should we have criteria that affect what methods we accept in the registry and what methods are not? Do we miss something that should be added to the DID Rubric document?
Again, I am not saying what we discuss here is not important. Obviously it is, very much so. But we should concentrate on the DID Core specification at this point, hence my questions. |
@iherman This issue, and the special topic call, are for clarifying the scope and intention of the group with regard to herd privacy. "Privacy" has been mentioned in 41 different issues, including this one. Arguments in many of those are based, essentially on herd privacy. Seven issues explicitly mention herd privacy and in the most recent discussion, both on voice calls and on github @talltree has dismissed privacy concerns asserting that herd privacy doesn't apply to all DIDs. I'd like to establish a consensus notion of herd privacy that will provide guidance for specific text changes. Once we have that, I will recommend specific PRs that address this consensus (although if the consensus is to remove the language of herd privacy, I'll defer that to someone else). It would be premature in this conversation to propose specific text changes, as the problem is that in multiple issues and PRs, there is a significant disconnect on this topic that has led more to arguments than productive debate. If we can establish a common framework, we can raise PRs that will bring the spec into alignment. |
@jandrieu, I know this is an emotional issue for you and others on this thread, but do you honestly believe it is fair to characterize me as "dismissing privacy concerns" when I have been the #1 advocate for Privacy by Design with DIDs since the very first line of this spec was ever written? I have never "dismissed privacy concerns" with regard to DIDs of any kinds. What I disagree with is the thesis that the "herd" when it comes to herd privacy for DIDs must be 100% of all DIDs. I have summarized the rationale in this short Google Slides deck that I prepared for the special topic call tomorrow and also attached as a PDF for those who cannot access Google Slides. It is not a complex argument. To argue that all DIDs must be designed to support herd privacy is to argue that all DIDs must appear in a context that supports anonymous or pseudonymous relationships. That ignores all the contexts where exactly the opposite is true: the DID must be well-known. This is summarized in this diagram from my slides: I look forward to discussing this in the special topic call tomorrow (note that I have developed a conflict for the early part of the call, so unfortunately I may be late). |
I am extremely uncomfortable with this image. It entirely out of proportion. Deployments in the near future these public DID ovals will fill the almost the entire box, and with just a little white space for the "all other DIDs". But these are the ones that need protection by being indistinguishable and non-correlatable. This would be like saying in SSL/TLS "we'll only secure payments", which means many parties could censor your shopping, vs. have the entire website in secured and no one can know if you are getting news, looking at a catalog, price shopping, or making a payment. I was similarly uncomfortable in the latest TLS 1.3 and how it supports traffic monitoring and ESNI. Early proposals for TLS 1.3 eliminated support for traffic monitoring entirely for herd privacy reasons, but they were added back in because of demands of enterprise to monitor traffic. The result is as I predicted, China is censoring TLS 1.3 that doesn't support traffic monitoring or that does support ESNI. True herd privacy demands that the private be indistinguishable from the public. |
@talltree I think it is completely fair to say that you dismissed privacy concerns. In multiple threads, including the latest with "resource" and definitely in the debate on "type", you dismissed my concerns over herd privacy, claiming it doesn't apply to all DIDs. Hence, this issue and tomorrow's special topic call. |
@jandrieu Saying that "herd privacy does not apply to all DIDs"—which I am—and "dismissing privacy concerns" are two different things. But I guess we'll just have to agree to disagree about that. However, since this topic is about privacy concerns, what I find most concerning is this statement from @ChristopherA:
There is a specific reason the diagram shows peer DIDs taking up so much room. To my knowledge, the vast majority of actual production deployments of verifiable credentials (VCs) where the subject is a human being will NOT use public DIDs for the subject due to—you guessed it—privacy concerns. Not only does using a public DID for the subject make it trivial to correlate across all presentations, but so does the signature over the credential presentation. In short, using public DIDs for individuals in VCs are perfect tracking beacons. What's worse, writing a public DID whose subject is a human being to an immutable blockchain is such a clear challenge for the GDPR right of erasure ("right to be forgotten") that the practice is still not allowed under the Sovrin Governance Framework. I don't say that lightly. The Sovrin Foundation spent a year working with attorneys and GDPR experts trying to find a clear path for individuals to register public DIDs on the Sovrin ledger without creating irresolvable GDPR conflicts. You can read the resulting analysis in this paper. We were never able to find a satisfactory answer. Therefore every implementation of verifiable credentials that use the Hyperledger Indy/Ursa/Aries stack that involves issuing VCs for human subjects use peer DIDs and ZKP credential formats. So I find it ironic that you are contending that all DIDs need herd privacy when in fact the strongest privacy is provided by using peer DIDs which do not need to be public at all. |
@talltree Your fundamental argument, as illustrated in your diagram, dismisses the privacy concerns of DIDs for individuals because, to you, Peer DIDs and "public" DIDs matter more. A more collaborative engagement would be to say "I recognize your concerns. Let's find a way to address them." |
@jandrieu -- In your initial post, creating this issue, you said --
I'm pretty sure you meant, "...a party who controls an identifier of the Subject..." because there's no such thing as the identifier of a Subject, because anyone can create a new identifier of any Subject at any time for their own purposes -- especially in the universe of DIDs -- and this is vitally important for any semblance of the kind of privacy you're trying to build. But even with that correction, your assertion is incorrect. The Holder signing a VP only has such control over the VC they're packaging as a VP that a Holder might have -- which doesn't necessarily include control over any identifier of the Subject. That Holder is consenting to the use of the VP for some purpose under some terms. This does not necessarily extend to use of any identifier of any entity, VC Subject or otherwise. Even the Issuer of a VC doesn't necessarily have control over any identifier of the Subject! The Issuer has control over some identifier(s) of the VC itself -- but that's it! Running into these basic inaccuracies in the first post in this thread does not bode well for the rest of it holding together. I submit that "privacy of DID Subjects" might be considered as one of the many axes of consideration in the DID Rubric, but it should not (I daresay, can not) be thought of as something which should (or can!) be built in and universally true, to any depth between zero and perfect, for all DID Subjects of all DIDs in all DID Methods. Existing in the world is imperfectly private. Perfect privacy is impossible. Regrettably, by including a number of absolutes, the GDPR is an over-reaching piece of legislation which will eventually be shown to have forbidden many technologies which would have improved the situations the GDPR was meant to address, and instead the GDPR is solidifying those aspects of the extremely un-private world at their current levels, and may even be making them worse. |
@TallTed I agree with your nuanced critique of the notion of "the" identifier, although I would clarify by saying the VP only provides proof for the identifier that is in the subject (assuming a single-subject VP and that the VIP is signed with cryptographic material provably linked to the VC Subject). That the "the" I was referring to. However, your second point misses the best practice of using proof-of-control both before issuance of a VC--which proves to the issuer that the soon-to-be holder is, in fact, in control of that identifier--AND proof-of-control of the identifier in that VC upon presentation, in the form of signing a challenge string in the VP using the same cryptography as that of the cryptographic identifier that is the Subject of the VC. Together, these two proofs demonstrate that the presenter has access to the same cryptographic secret(s) that the initial recipient does. This practice depends on proof that the presenter, does, in fact, control the identifier in that VC. My other clarification: because of our extensibility model, we can't prevent DID Methods from violating herd privacy, and I'm not arguing for that. Rather, I'm arguing that DID Core should not define features that promote violations of herd privacy. DID Methods are the proper place for such innovations. |
The issue was discussed in a meeting on 2021-01-14 List of resolutions:
View the transcript1. Herd PrivacySee github issue #539.
Manu Sporny: we can review joe's position, then discuss a fallback consensus position. Christopher Allen: I thought i can recap history or talk about pre-DIDs while we're waiting for Drummond. Daniel Hardman: I'm aware of the herd privacy topic and I'm pretty sure I'd say anything drummond would say. Joe Andrieu: I think herd privacy is vital to the DID Core spec, and we need to update it to address herd privacy while ensuring that DID method implementers are free to innovate.
Daniel Hardman: the first part feels comfortable, the second part needs some nuance. Orie Steele: to note two points related the concept of herd privacy being fundamental.
Manu Sporny: I feel like I understand mostly where people are coming from. Christopher Allen: I'm coming from a history of great intentions that result in regrets and there's a lot of people in ietf that have turned around and said we need to be radical. Adrian Gropper: to daniel's point, and I think drummond's, I don't believe that DIDs for things and documents where those things and documents are associated with people are not part of herd privacy.
Joe Andrieu: one of the reasons for this call is that I'm fed up of arguing with drummond and having concern for the privacy of individuals dismissed because of features for things that aren't individuals. Drummond Reed: I am upset with Joe using the term that I've been dismissing privacy.
Drummond Reed: which is I'm a huge supporter of herd privacy for the context in which it's needed.
Ted Thibodeau Jr.: names can be used in the anonymous sphere without disrupting the anonymity of those who prefer not to assert some known identity. Dave Longley: I was about to say something very similar to what ted said. Manu Sporny: a concrete question I have for joe and christopher, it feels like we're all on the same page with respect to trying to protect people when it comes to herd privacy.
Joe Andrieu: to manu's question, it reveals the nature of the subject.
Christopher Allen: I have no problem with there being some type of feature where you can request is this a Daniel Buchner: use cases [use cases] [[use cases]]. Markus Sabadello: comment about DID methods.
Markus Sabadello: I think it would a be a big mistake for a did method that can only identify things, only identify github users. Manu Sporny: questions for each side.
Drummond Reed: christopher said something I find helpful. We're talking about layering. Ivan Herman: oh yes. Drummond Reed: that distinction is so important.
Christopher Allen: I want to respond to daniel's thing of having this universal way of differentiating this is a software file.
Joe Andrieu: slightly different answer to dbuc. Adrian Gropper: [??] from my perspective the efficiency of the process is not the issue at the DID Core level.
Manu Sporny:
Daniel Buchner: types being generalized in the sense of there being a concept in the top level DID spec is good. There are drawbacks that come with it. If it's generalized across the methods they can interoperate. If it doesn't happen that's cool, but different methods might pick different types. Grant Noble: the DID method I'm working on is great for long term DIDs that should not be controlled by other entities, and humans who want maximum privacy.
Joe Andrieu: I feel like this was ramrodded in and doesn't solve the underlying use case.
Joe Andrieu: you can take a DID, resolve to a DID doc, and dereference according to the method, and get a resource.
Drummond Reed: I strongly disagree with what joe just said.
Ivan Herman: when I looked through the slides, there was the reference to this long appendix which seems to be also hanging on the same discussion.
Manu Sporny: thank you drummond, that is helpful to know evernym is not going to object..
Manu Sporny: The challenge I have with this entire discussion, and the same thing with type, this is a criticism of the arguments to not put this in did core, if these are useful features people are going to use them anyway.
Dave Longley: all of this comes down to philosophy around what goes into DID core and what doesn't.
Dave Longley: If there are sets of things that people feel like will cause problems with privacy and security, we have a place to put those sorts of features, and that is in the registries. Ivan Herman: how do the editors feel? do they have something to go away with, or not?.
Joe Andrieu: this call was not to be about
Manu Sporny: two proposals for each point of view. I'm trying to focus on
Christopher Allen: I think
Ivan Herman: I have the impression that this will not make it.
Manu Sporny: we'll need to talk again, editors.
Ivan Herman: closing remark: I would like to see where this full discussion leads as far as the whole spec is concerned. I have heard several times that |
@talltree Your argument continues to dismiss the concerns of what you see as a minority of usage. We can speculate all we want about which DIDs are going to be used more; that's a distraction. The DID Core specification MUST be seen through the light of all DIDs. To do anything less is to dismiss the concerns of those public DID Methods that are used for referring to individuals. Just because your systems have made assumptions about which DIDs to use for which situations does not mean that your design choices should be imposed on DID Core and thereby encouraged for use by DID Methods. On the contrary, when we realize that Methods are choosing potentially dangerous features, they should be highlighted in the section on privacy and security concerns and in the DID Spec Registries if added there. They most certainly should not be enshrined in DID Core. I'll reiterate my position, as expressed on that special topic call:
|
I disagree with the notion that "herd privacy doesn't need to apply to all DIDs". The very definition of herd privacy is that it applies to everything in the herd. The moment anything in the herd is excluded, privacy diminishes for everything in the herd. Responding to @ChristopherA:
and @jandrieu:
What we could potentially do is to add normative language to Methods > Privacy Requirements to say something like:
(And, I assume only conforming DID method specs are (or will be eventually) admitted to the Registries. So any method specs without these sections would get immediately flagged on review.) |
Two specific PRs have been merged for this issue (including #616 created just now). Please mark the issue as deferred. FWIW, there is still some confusion by implementers about how to best handle herd privacy. It may be appropriate to add some language to an implementation guide. For now, I understand we won't have an opportunity to get additional PRs in for the immanent Candidate Recommendation, but maybe the implementation guide is the right vehicle for further discussion. |
@jandrieu can this be closed? Happy to discuss on an upcoming call if it has not been fully addressed. |
This was discussed during the #did meeting on 06 December 2024. View the transcriptw3c/did-core#539manu: +1 to closing, the current process is to raise an issue, discuss in WG, then Pull Request, then merge if consensus. <denkeni> +1 for it decentralgabe: The section on herd privacy, what it means, how it applies to DIDs, good discussion here... decentralgabe: Seems like we should discuss with Joe. Let's ask Joe if this could be closed. |
Continuing a conversation from PR #480
Section 10.5 Herd Privacy https://w3c.github.io/did-core/#herd-privacy says
However, there is some debate about what herd privacy means and how to apply it to DIDs.
This issue is for developing a consensus definition, after which one or more PRs are expected to be proposed to provide corrections to the DID specifications.
Herd privacy works exactly to the extent that a given feature applies to all DIDs and DID Subjects. That's the herd. When a feature applies to just certain DID Subjects, that enables privacy penetration that would not be possible if true herd privacy exists. When the herd is separable into distinguishable segments, it is possible to discern unintended details about each segment, causing privacy leaks.
For comparison, consider IP v4 addresses which, have excellent herd privacy. Every public IP address is structured and interpreted the same way and reveal nothing about who owns the machine responding to that IP address, what applications are running on that IP, or what type of hardware or network is running at the endpoint. In fact, you can't even tell if that IP address is just a first hop in a forwarding process that ends up with some other machine—with a different IP address—actually doing the work of composing a response.
The exceptions to herd privacy with IP addresses are informative.
First, there are a few, special, functionally unique private addresses are which do behave differently. There are the private identifier subnets like 192.168., link-local address like 169.254.*, and localhost 127.0.0.0. If the IP address is one of those exceptions, you know something about the initial destination (it's on a private subnet, its likely on a network without DHCP or an assigned IP, and its running on the same machine), but NOTHING else. You still don't know who owns it, what machine is running there, or what applications might be running on it.
Second, the process of assigning IP addresses has a historical legacy that makes it possible to guess, with some level of accuracy who owns a given IP, or at least who secured that IP address from IANA, and often what region of the world that party is from. This was originally done to simplify IP-based routing and the bureaucratic overhead of issuing IP numbers. These optimizations are not part of the IP spec, but rather a management decision that has ongoing consequences.
Third, the inevitable visibility of IP addresses on the network—you can't route an IP packet without looking at the destination header—means that network analysis can, to some level of accuracy, identify the geographic destination of IP addresses. As a result, there are numerous directory services that provide exactly this functionality. They don't always get it right, especially when Network Address Translation is used by large aggregators, but it is a privacy leak with the design.
VPNs, TOR, NATs, and other approaches help mitigate these problems. However, IP addresses sometimes can and ARE used to de-anonymize parties and, in some cases must be treated a PII or personal data.
How does this apply to DIDs?
For DIDs to be a privacy-respecting technology, it is imperative that DIDs for different Subjects remain indistinguishable from each other, modulo the functional mechanisms necessary to establish proof of control over the unique identifiers themselves. You should not be able to tell, by looking at a DID, a DID-URL, or a DID Document, that the Subject of that DID is a particular person or organization, nor what type of entity it is: a human, a corporation, or an inanimate object.
The community behind DIDs already established a privacy-respecting way for assertions about Subjects to be made and managed, Verifiable Credentials (VCs). VCs allow anyone to say anything about any Subject and, thanks to Verifiable Presentations (VPs), there is, in the specification, a concrete mechanism to ensure that reliance upon a given assertion is consented to by the holder. When the holder signs a VP, it establishes that a party who controls the identifier of the Subject consents to its use for some purpose under some terms.
DIDs by their nature don't have that. DID resolution, like IP addresses, requires that DIDs, DID-URLs, and DID Documents be visible and resolvable by anyone. Yes, you can, in theory, add a privacy layer for "private" DIDs whose DID Document is only retrievable after some form of authentication, authorization, or consent protocol. However, those very mechanisms cannot be validated externally; they create a dependency on a trusted endpoint for deciding who does or does not get the cryptographic material associated with a given identifier. In order for DIDs to realize their goal of decentralization, it MUST be possible to retrieve the cryptographic material that secures subsequent interactions WITHOUT reliance on a trusted third party.
Yes, one could look at did:peer, did:key, and did:schema as counter examples, but those are all mechanisms that are either not publicly resolvable (and therefore unsuitable for cross-context verification like VCs), not updatable (without explicit communication to all parties using the identifier, which may be impossible), or they don't provide a means to demonstrate proof-of-control at all. Like the exceptions to public IP addresses, these methods demonstrate ways that you can work within the DID spec to recreate existing paradigms like PGP, public/private keypairs, and CIDs, rather than innate capabilities that are necessary and applicable to all DID Methods.
Which is fine. If you want your DID Method to be for Subjects of a particular nature, such as a DID Method that ALWAYS represents cars, go for it. DID Methods are free to violate herd privacy in this manner. But DID Core should not.
Returning to the IP example, there are services that will, based on publicly available information, map an IP address to a specific geographic location with some level of accuracy and precision.
These services highlight a flaw in IP addresses' herd privacy that could have been avoided. This is a lesson we must take to heart.
What can be scraped, will be.
What can be observed on the network, will be.
These threat vectors MUST be included in the privacy analysis of everything that goes into DID Core.
What can be provided at another layer, should be.
We have a moral obligation to avoid these sorts of privacy problems whenever possible. Herd privacy is how we do that. Herd privacy is violated directly proportionally to the variability that can be used to separate the herd.
As such, we must bring the DID Spec into alignment with principles of herd privacy, while allowing DID Methods to innovate and extend.
DID Core is a meta-standard that enables unbounded extensibility. The burden is on us to make sure that the foundation is privacy respecting. Only those properties and features which are necessary and appropriate should be enshrined in DID Core. Because DID Methods have the freedom to innovate, THAT is where potentially risky or harmful ideas are best tried out. In contrast, properties in DID Core literally define commonality and best practices, which, even when "optional" encourage new DID Methods to use those features. This amplification effect is why it is so imperative that we minimize the potential harms from features of DID Core. If we do not, we will actively encourage DID Method designers to adopt harmful practices.
Finally, I want to directly refute a claim made by one of the DID Core spec editors @talltree:
#480 (comment)
This reflects a fundamental misunderstanding of how herd privacy works. Herd privacy ONLY works when it applies to all DIDs equally. Any differentiation between classes of DIDs directly from features defined in DID Core undermines herd privacy.
Discussion welcome.
The text was updated successfully, but these errors were encountered: