loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Protecting Respondents' Identities in Microdata Release
November/December 2001 (vol. 13 no. 6)
pp. 1010-1027

Abstract—Today's globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. Deidentifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to reidentify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of the respondents to which the data refer. The approach is based on the definition of k-anonymity. A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not to distort the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations.

[1] N.R. Adam and J.C. Wortmann, “Security-Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, vol. 21, pp. 515-556, 1989.
[2] R. Anderson, “A Security Policy Model for Clinical Information Systems,” Proc. IEEE Symp. Security and Privacy, pp. 30-43, May 1996.
[3] L.H. Cox, “Suppression Methodology and Statistical Disclosure Analysis,” J.Am. Statistical Assoc., vol. 7, no. 5, pp. 377-385, 1980.
[4] T. Dalenius, “Finding a Needle in a Haystack—or Identifying Anonymous Census Record,” J. Official Statistics, vol. 2, no. 3, pp. 329-336, 1986.
[5] B.A. Davey and H.A. Priestley, Introduction to Lattices and Order. Cambridge Univ. Press, 1990.
[6] D.E.R. Denning, Cryptography and Data Security. Addison-Wesley, 1983.
[7] J. Dobson, S. Jajodia, M. Olivier, P. Samarati, and B. Thuraisingham, “Privacy Issues in WWW and Data Mining,” IFIP WG11. 3 Working Conf. Database Security—Panel Notes, 1998.
[8] Private Lives and Public Policies, G.T. Duncan, T.B. Jabine, and V.A. de Wolf, eds., Nat'l Academy Press, 1993.
[9] A. Hundepool and L. Willenborg, “$\mu$- and$\tau-ARGUS$: Software for Statistical Disclosure Control,” Proc. Third Int'l Seminar Statistical Confidentiality, 1996.
[10] S. Jajodia and C. Meadows, “Inference Problems in Multilevel Secure Database Management Systems,” Information Security—An Integrated Collection of Essays. M.D. Abrams, S. Jajodia, and H.J. Podell, eds., pp. 570-584, IEEE C. S. Press, May 1989.
[11] T.F. Lunt, “Aggregation and Inference: Fact and Fallacies,” Proc. IEEE Symp. Research in Security and Privacy, 1989.
[12] Committee on Maintaining Privacy and Security in Health Care Application of the National Information Infrastructure, For the Record—Protecting Electronic Health Information, 1997.
[13] Federal Committee on Statistical Methodology, “Statistical Policy Working Paper 22,” Report on Statistical Disclosure Limitation Methodology, May 1994.
[14] X. Qian,M.E. Stickel,P.D. Karp,T.F. Lunt, and T.D. Garvey,"Detection and elimination of inference channels in multilevel relational database systems," Proc. IEEE Computer Society Symp. Research in Security and Privacy, pp. 196-205, May 1993.
[15] P. Samarati and L. Sweeney, “Protecting Privacy when Disclosing Information:k-anonymity and Its Enforcement through Generalization and Suppression,” Technical Report, SRI Int'l, Mar. 1998.
[16] L. Sweeney, “Guaranteeing Anonymity when Sharing Medical Data, the Datafly System,” Proc. J. Am. Medical Informatics Assoc., Washington, DC.: Hanley&Belfus, Inc., 1997.
[17] L. Sweeney, “Weaving Technology and Policy Together to Maintain Confidentiality,” J. Law, Medicine Ethics, vol. 25, nos. 2 and 3, pp. 98-110, 1997
[18] R. Turn, “Information Privacy Issues for the 1990's,” Proc. IEEE Symp. Security and Privacy, pp. 394-400, May 1990.
[19] J. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.
[20] L. Willenborg and T. De Waal, Statistical Disclosure Control in Practice. Springer-Verlag, 1996.
[21] B. Woodward, “The Computer-Based Patient Record Confidentiality,” The New England J. Medicine, vol. 333, no. 21, pp. 1419-1422, 1995.

Citation:
P. Samarati, "Protecting Respondents' Identities in Microdata Release," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001, doi:10.1109/69.971193
Usage of this product signifies your acceptance of the Terms of Use.