The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2010 vol.22)
pp: 868-883
Mehmet Ercan Nergiz , Sabanci University, Istanbul
Christopher Clifton , Purdue University, West Lafayette
Advances in information technology, and its use in research, are increasing both the need for anonymized data and the risks of poor anonymization. In [CHECK END OF SENTENCE], we presented a new privacy metric, \delta-presence, that clearly links the quality of anonymization to the risk posed by inadequate anonymization. It was shown that existing anonymization techniques are inappropriate for situations where \delta--presence is a good metric (specifically, where knowing an individual is in the database poses a privacy risk). This article addresses a practical problem with [CHECK END OF SENTENCE], extending to situations where the data anonymizer is not assumed to have complete world knowledge. The algorithms are evaluated in the context of a real-world scenario, demonstrating practical applicability of the approach.
k-Anonymity, privacy, delta presence, medical databases.
Mehmet Ercan Nergiz, Christopher Clifton, "δ-Presence without Complete World Knowledge", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 6, pp. 868-883, June 2010, doi:10.1109/TKDE.2009.125
[1] M.E. Nergiz, M. Atzori, and C. Clifton, "Hiding the Presence of Individuals in Shared Databases," Proc. ACM SIGMOD '07, June 2007.
[2] "Standard for Privacy of Individually Identifiable Health Information," Fed. Register, vol. 66, no. 40, Feb. 2001.
[3] P. Samarati, "Protecting Respondent's Privacy in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
[4] P. Samarati and L. Sweeney, "Generalizing Data to Provide Anonymity when Disclosing Information (abstract)," Proc. 17th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS '98), p. 188, 1998.
[5] A. Ohrn and L. Ohno-Machado, "Using Boolean Reasoning to Anonymize Databases," Artificial Intelligence in Medicine, vol. 15, no. 3, pp. 235-254, Mar. 1999.
[6] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy beyond $k$ -Anonymity," Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), Apr. 2006.
[7] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), Sept. 2006.
[8] Nat'l Inst. of Diabetes and Digestive and Kidney Diseases, "National Diabetes Statistics Fact Sheet: General Information and National Estimates on Diabetes in the United States," Technical Report NIH Publication No. 06-3892, U.S. Dept. of Health and Human Services, Nat'l Inst. of Health, Nov. 2005.
[9] A.D. Association, "Direct and Indirect Costs of Diabetes in the United States," cost-of-diabetes-in-us.jsp, 2006.
[10] A. Asuncion and D.J. Newman, "UCI Machine Learning Repository," School of Information and Computer Science, Univ. of California, Irvine, , 2007.
[11] N. Li and T. Li, "t-Closeness: Privacy Beyond k-Anonymity and l-Diversity," Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE '07), Apr. 2007.
[12] R.C.-W. Wong, J. Li, A.W.-C. Fu, and K. Wang, "($\alpha$ , k)-Anonymity: An Enhanced k-Anonymity Model for Privacy Preserving Data Publishing," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 754-759, 2006.
[13] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, "Aggregate Query Answering on Anonymized Tables," Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE '07), pp. 116-125, Apr. 2007.
[14] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain k-Anonymity," Proc. ACM SIGMOD '05, pp. 49-60, 2005.
[15] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional k-Anonymity," Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), pp. 25-35, Apr. 2006.
[16] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1. Wiley, 1968.
[17] S.N. Lahiri, A. Chatterjee, and T. Maiti, "Normal Approximation to the Hypergeometric Distribution in Nonstandard Cases and a Sub-Gaussian Berryesseen Theorem," J. Statistical Planning and Inference, vol. 137, no. 11, pp. 3570-3590, Nov. 2007.
[18] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 279-288, 2002.
[19] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal k-Anonymization," Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE '05), pp. 217-228, 2005.
[20] G. Agrawal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering" Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '06), pp. 153-162, June 2006.
[21] Y. Tao, X. Xiao, J. Li, and D. Zhang, "On Anti-Corruption Privacy Preserving Publication," Proc. 24th IEEE Int'l Conf. Data Eng. (ICDE), pp. 725-734, 2008.
[22] M.E. Nergiz, S. Cetintas, and F. Akova, "Generalizations with Probability Distributions for Data Anonymization," Technical Report TR-08-001, Dept. of Computer Sciences, Purdue Univ., 2008.
[23] C.C. Aggarwal, "On k-Anonymity and the Curse of Dimensionality," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 901-909, 2005.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool