loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
δ-Presence without Complete World Knowledge
June 2010 (vol. 22 no. 6)
pp. 868-883
Mehmet Ercan Nergiz, Sabanci University, Istanbul
Christopher Clifton, Purdue University, West Lafayette
Advances in information technology, and its use in research, are increasing both the need for anonymized data and the risks of poor anonymization. In [CHECK END OF SENTENCE], we presented a new privacy metric, \delta-presence, that clearly links the quality of anonymization to the risk posed by inadequate anonymization. It was shown that existing anonymization techniques are inappropriate for situations where \delta--presence is a good metric (specifically, where knowing an individual is in the database poses a privacy risk). This article addresses a practical problem with [CHECK END OF SENTENCE], extending to situations where the data anonymizer is not assumed to have complete world knowledge. The algorithms are evaluated in the context of a real-world scenario, demonstrating practical applicability of the approach.

[1] M.E. Nergiz, M. Atzori, and C. Clifton, "Hiding the Presence of Individuals in Shared Databases," Proc. ACM SIGMOD '07, June 2007.
[2] "Standard for Privacy of Individually Identifiable Health Information," Fed. Register, vol. 66, no. 40, Feb. 2001.
[3] P. Samarati, "Protecting Respondent's Privacy in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
[4] P. Samarati and L. Sweeney, "Generalizing Data to Provide Anonymity when Disclosing Information (abstract)," Proc. 17th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS '98), p. 188, 1998.
[5] A. Ohrn and L. Ohno-Machado, "Using Boolean Reasoning to Anonymize Databases," Artificial Intelligence in Medicine, vol. 15, no. 3, pp. 235-254, Mar. 1999.
[6] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy beyond $k$ -Anonymity," Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), Apr. 2006.
[7] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), Sept. 2006.
[8] Nat'l Inst. of Diabetes and Digestive and Kidney Diseases, "National Diabetes Statistics Fact Sheet: General Information and National Estimates on Diabetes in the United States," Technical Report NIH Publication No. 06-3892, U.S. Dept. of Health and Human Services, Nat'l Inst. of Health, Nov. 2005.
[9] A.D. Association, "Direct and Indirect Costs of Diabetes in the United States," http://www.diabetes.org/diabetes-statistics cost-of-diabetes-in-us.jsp, 2006.
[10] A. Asuncion and D.J. Newman, "UCI Machine Learning Repository," School of Information and Computer Science, Univ. of California, Irvine, http://www.ics.uci.edu/~mlearnMLRepository.html , 2007.
[11] N. Li and T. Li, "t-Closeness: Privacy Beyond k-Anonymity and l-Diversity," Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE '07), Apr. 2007.
[12] R.C.-W. Wong, J. Li, A.W.-C. Fu, and K. Wang, "($\alpha$ , k)-Anonymity: An Enhanced k-Anonymity Model for Privacy Preserving Data Publishing," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 754-759, 2006.
[13] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, "Aggregate Query Answering on Anonymized Tables," Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE '07), pp. 116-125, Apr. 2007.
[14] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain k-Anonymity," Proc. ACM SIGMOD '05, pp. 49-60, 2005.
[15] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional k-Anonymity," Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), pp. 25-35, Apr. 2006.
[16] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1. Wiley, 1968.
[17] S.N. Lahiri, A. Chatterjee, and T. Maiti, "Normal Approximation to the Hypergeometric Distribution in Nonstandard Cases and a Sub-Gaussian Berryesseen Theorem," J. Statistical Planning and Inference, vol. 137, no. 11, pp. 3570-3590, Nov. 2007.
[18] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 279-288, 2002.
[19] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal k-Anonymization," Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE '05), pp. 217-228, 2005.
[20] G. Agrawal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering" Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '06), pp. 153-162, June 2006.
[21] Y. Tao, X. Xiao, J. Li, and D. Zhang, "On Anti-Corruption Privacy Preserving Publication," Proc. 24th IEEE Int'l Conf. Data Eng. (ICDE), pp. 725-734, 2008.
[22] M.E. Nergiz, S. Cetintas, and F. Akova, "Generalizations with Probability Distributions for Data Anonymization," Technical Report TR-08-001, Dept. of Computer Sciences, Purdue Univ., 2008.
[23] C.C. Aggarwal, "On k-Anonymity and the Curse of Dimensionality," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 901-909, 2005.

Index Terms:
k-Anonymity, privacy, delta presence, medical databases.
Citation:
Mehmet Ercan Nergiz, Christopher Clifton, "δ-Presence without Complete World Knowledge," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 6, pp. 868-883, May 2010, doi:10.1109/TKDE.2009.125
Usage of this product signifies your acceptance of the Terms of Use.