, Purdue University
Pages: pp. 3-6
Abstract—Secure and privacy-preserving digital identity management is a key requirement for secure use of the Internet and other online environments. However, the landscape of digital identity management is quite complex, with several different stakeholders. Here, the author discusses critical issues that must be addressed for the large-scale and effective deployment of digital identity solutions.
Keywords—authentication, privacy, trust
The problem of security and trust in cyberspace is long-standing and is perhaps one reason why the Internet and other online environments haven't been used to their full potential. Among the many aspects of this problem, one major challenge is the reliable and convenient authentication of users, devices, and other parties in online environments. We've all heard about "digital identity theft," which generally refers to malicious parties stealing individuals' passwords or credentials for financial gain or other purposes. Even though identity theft existed long before the Internet was established, the problem has grown enormously in cyberspace.
To address digital identity theft and provide a more secure cyberspace, the US government unveiled plans for a National Strategy for Trusted Identities in Cyberspace (NSTIC; www.nist.gov/nstic) earlier this year. Having a strategy in place is certainly important and provides an initial understanding of the many dimensions of digital identity management.
Digital identity is indeed a complex notion, and many definitions exist. We can define digital identity as the digital representation of information known about an individual or party. Such information, referred to as identity attributes, encompasses not only attributive information, such as social security number, date of birth, and country of origin, but also biometrics, such as iris or fingerprint features, and information about user activities, including Web searches and e-shopping transactions.
Another example is the ITU's definition, which defines identity as "information about an entity that is sufficient to identify that entity in a particular context." This definition includes identifiers such as login names and pseudonyms. The specific set of identity attributes and identifiers used to carry on a specific transaction in cyberspace can vary considerably. In the digital identity ecosystem that the NSTIC plan envisions, just providing an identity attribute, such as age or a pseudonym, might be sufficient in some cases to allow access to protected resources or obtain a service; in other cases, several identity attributes and contextual information—such as an individual's current location—might be required.
Yet another (complementary) definition of digital identity is that it's a claim a party makes about itself or some other party. The term "claim" refers to an assertion about the truth of something—typically, a truth that's disputed or in doubt. This definition points out that digital identities must be verified through an authentication process. Authentication has many forms, ranging from passwords to smart cards or tokens to biometric verification. In this context, as the NSTIC plan discusses, the term "credential" refers to an information object used during a transaction to provide evidence about an identity claim. The required strength of the authentication process depends on the transaction that a party is trying to carry out—gaining access to sensitive data requires a much more thorough authentication process than does buying a book online.
This already complex identity landscape is further complicated by the need for identity solutions that are easy to use, cost-effective, and interoperable. In particular, NSTIC emphasizes the idea that users today must remember myriad passwords, which are often weak; one benefit of the envisioned digital identity ecosystem will be to provide users with a small set of secure credentials that lets them seamlessly access various services.
Given the numerous requirements concerning identity management (such as security, privacy, ease of use, and low costs) and the many parties involved (for instance, users, vendors, governmental organizations, and service providers), we must address several major challenges for the vision outlined in the NSTIC plan to become reality.
One major problem is that we still have very insecure software systems. Merely adopting techniques based on cryptographic tokens for authenticating users won't be sufficient. All systems involved in managing identities must be secure. Today's operating systems and applications are largely insecure, and attacks come from many different sources, including insiders. Unfortunately, the problem of trusted identities can't be decoupled from the problem of secure software systems and data protection. Consequently, even the term "trusted identities" must be used with care. We must always keep in mind that if we have difficulty claiming that a system is secure, we will likewise have difficulty claiming that an identity is trusted.
Rather, trusted identities should clearly refer to approaches by which organizations can indicate the organizational processes, technical tools, and background information that have been used to issue a credential. The party that must verify the credential can then decide based on this information whether to accept the identity claim the credential supports. So, trusted identity really refers to comprehensive, articulated, and flexible processes for verifying identity claims by parties in cyberspace. In this respect, an important technical requirement is that credentials (such as smart cards, certificates, and cryptographic tokens) include indications about the organizational process used to issue them. When such processes are well understood and standardized, including such an indication is straightforward.
Along with this indication, the provenance of digital identity is crucial. This notion refers to the set of information and other credentials used to issue a credential to a party. A typical example would be that, to obtain a driver's license in a given state, you must provide a passport and proof of residence for that state. These requirements are what I call credential issuance policies. The specific passport and proof of residence used to issue a driver's license to Bob Smith would be the provenance of Bob Smith's driver's license.
Finally, because identity attributes and corresponding credentials are data, data quality is a crucial concern; we must have assurance that this data is error-free and up to date. Sometimes, having credentials withwrong information can create as many problems as does identity theft. For users, the ability to verify what's recorded in credentials might not be easy, especially when these credentials are cryptographic tokens.
Privacy represents another, even more critical issue. In fact, the envisioned digital identity ecosystem emphasizes privacy as a crucial requirement. However—as well articulated in the Electronic Privacy Information Center's (EPIC's) formal response to NSTIC ( http://epic.org/privacy/nstic.html), the parties that will provision identity attributes and credentials—that is, the digital identity issuers—could be in a position to gather information about how those attributes and credentials are being used. Such information could be a gold mine for organizations that want to, for example, create shopping profiles of customers. Failing to prevent digital identity issuers from collecting this information or protecting the information, once collected, against misuse or theft will further undermine privacy.
Even when such information is anonymized, privacy isn't guaranteed. Researchers have extensively investigated data anonymization techniques over the past 20 years; however, to date we can't claim that such techniques are strong. A quick look at the literature in this field shows that whenever someone proposes a new anonymization approach (for example, k-anonymization, l-diversity, t-closeness, or differential privacy-based techniques), methods for breaching the approach appear soon after. Using pseudonyms doesn't help much either—by combining information about one party's transactions and activities in cyberspace with additional information available from different sources, an adversary could determine the real individual with whom the pseudonym is associated. EPIC's response to the NSTIC plan clearly points out the need for "a clear plan for privacy protection" and "a strategy for the protection of private communications by fair information practices."
When talking about civil rights, EPIC also highlights the need for "assurance that Internet users can continue to create, control, and own Web content." Given that today's Internet and Web play a key role as free media enabling universal open access, we want to ensure that this tradition continues and that the envisioned identity management solutions don't endanger users' freedom to post and share content, create associations, or perform other collaborative activities. A crucial issue here is, of course, how to reconcile the freedom of "activities" in cyberspace with its secure use. This is a difficult issue that must be fully understood and addressed via proper policies, which in turn should drive the development and deployment of proper technical and organizational solutions.
A last consideration concerns user choice with respect to which identity credentials and providers to use. On one hand, the NSTIC plan emphasizes interoperability as a guiding requirement. On the other hand, interoperability can be very difficult to achieve in practice. We can't solve the interoperability issue simply by providing mechanisms for translating between different identity credential formats (for example, translating from a SAML assertion to an X.509 certificate). Interoperability also involves using different protocols that take different steps, have different interactions, and are based on different trust assumptions; using different processes to issue identity credentials; and semantically matching identity attributes. Such matching might require mappings among different namespaces or ontologies. Even though a large body of research exists in the Semantic Web and database communities that addresses issues related to integrating semantically heterogeneous data, applying such research to very large-scale environments might prove problematic.
So, what can we say in conclusion about NSTIC? The plan is ambitious, and achieving its vision entails addressing several major challenges, many of which aren't merely technical or organizational. However, even if the plan won't fully achieve its vision, it certainly represents an important step toward an open and broader discussion of how to provide an open, universal, privacy-preserving, and secure cyberspace.
Anne-Marie Kermarrec is a senior researcher at INRIA Rennes in France, where she leads a research group focused on large-scale distributed systems. Her research interests are in fully decentralized systems, social networks, dissemination, and epidemic protocols. Kermarrec has a PhD in fault-tolerant distributed shared memory systems from the University of Rennes, France. Previously, she was a researcher with Microsoft Research in Cambridge and an assistant professor at the University of Rennes 1. She's the ACM system software awards chair.
Peter Mika is a senior research scientist at Yahoo, based in Barcelona, Spain, where he works on the application of semantic technology to Web search. He has an MSc and PhD in computer science from Vrije Universiteit Amsterdam. Mika is the author of Social Networks and the Semantic Web (Springer, 2007), is a regular speaker at both academic and technology conferences, and serves on the advisory board of a number of public and private initiatives. In 2008, he was selected as one of "AI's Ten to Watch" by the IEEE Intelligent Systems editorial board.