This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Closeness: A New Privacy Measure for Data Publishing
July 2010 (vol. 22 no. 7)
pp. 943-956
Ninghui Li, Purdue University, West Lafayette
Tiancheng Li, Purdue University, West Lafayette
Suresh Venkatasubramanian, University of Utah, Salt Lake City
The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of \ell-diversity has been proposed to address this; \ell-diversity requires that each equivalence class has at least \ell well-represented (in Section 2) values for each sensitive attribute. In this paper, we show that \ell-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called “closeness.” We first present the base model t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called (n,t)-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.

[1] C. Aggarwal, "On $k$ -Anonymity and the Curse of Dimensionality," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.
[2] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering," Proc. ACM Symp. Principles of Database Systems (PODS), pp. 153-162, 2006.
[3] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin, Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., 1993.
[4] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal $k$ -Anonymization," Proc. Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.
[5] F. Bacchus, A. Grove, J.Y. Halpern, and D. Koller, "From Statistics to Beliefs," Proc. Nat'l Conf. Artificial Intelligence (AAAI), pp. 602-608, 1992.
[6] J.-W. Byun, Y. Sohn, E. Bertino, and N. Li, "Secure Anonymization for Incremental Datasets," Proc. VLDB Workshop Secure Data Management (SDM), pp. 48-63, 2006.
[7] B.-C. Chen, K. LeFevre, and R. Ramakrishnan, "Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007.
[8] G.T. Duncan and D. Lambert, "Disclosure-Limited Data Dissemination," J. Am. Statistical Assoc., vol. 81, pp. 10-28, 1986.
[9] B.C.M. Fung, K. Wang, and P.S. Yu, "Top-Down Specialization for Information and Privacy Preservation," Proc. Int'l Conf. Data Eng. (ICDE), pp. 205-216, 2005.
[10] C.R. Givens and R.M. Shortt, "A Class of Wasserstein Metrics for Probability Distributions," Michigan Math J., vol. 31, pp. 231-240, 1984.
[11] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints," Proc. ACM SIGKDD, pp. 279-288, 2002.
[12] D. Kifer and J. Gehrke, "Injecting Utility into Anonymized Datasets," Proc. ACM SIGMOD, pp. 217-228, 2006.
[13] N. Koudas, D. Srivastava, T. Yu, and Q. Zhang, "Aggregate Query Answering on Anonymized Tables," Proc. Int'l Conf. Data Eng. (ICDE), pp. 116-125, 2007.
[14] S.L. Kullback and R.A. Leibler, "On Information and Sufficiency," Annals of Math. Statistics, vol. 22, pp. 79-86, 1951.
[15] D. Lambert, "Measures of Disclosure Risk and Harm," J. Official Statistics, vol. 9, pp. 313-331, 1993.
[16] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain $k$ -Anonymity," Proc. ACM SIGMOD, pp. 49-60, 2005.
[17] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional $k$ -Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 25, 2006.
[18] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Workload-Aware Anonymization," Proc. ACM SIGKDD, pp. 277-286, 2006.
[19] N. Li, T. Li, and S. Venkatasubramanian, "$t$ -Closeness: Privacy beyond $k$ -Anonymity and $\ell$ -Diversity," Proc. Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.
[20] T. Li and N. Li, "Injector: Mining Background Knowledge for Data Anonymization," Proc. Int'l Conf. Data Eng. (ICDE), 2008.
[21] T. Li and N. Li, "Towards Optimal $k$ -Anonymization," Data and Knowledge Eng., vol. 65, pp. 22-39, 2008.
[22] T. Li, N. Li, and J. Zhang, "Modeling and Integrating Background Knowledge in Data Anonymization," Proc. Int'l Conf. Data Eng. (ICDE), 2009.
[23] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy Beyond $k$ -Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 24, 2006.
[24] D.J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J.Y. Halpern, "Worst-Case Background Knowledge for Privacy-Preserving Data Publishing," Proc. Int'l Conf. Data Eng. (ICDE), pp. 126-135, 2007.
[25] A. Meyerson and R. Williams, "On the Complexity of Optimal $k$ -Anonymity," Proc. ACM Symp. Principles of Database Systems (PODS), pp. 223-228, 2004.
[26] M.E. Nergiz, M. Atzori, and C. Clifton, "Hiding the Presence of Individuals from Shared Databases," Proc. ACM SIGMOD, pp. 665-676, 2007.
[27] H. Park and K. Shim, "Approximate Algorithms for $k$ -Anonymity," Proc. ACM SIGMOD, pp. 67-78, 2007.
[28] V. Rastogi, S. Hong, and D. Suciu, "The Boundary between Privacy and Utility in Data Publishing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 531-542, 2007.
[29] Y. Rubner, C. Tomasi, and L.J. Guibas, "The Earth Mover's Distance as a Metric for Image Retrieval," Int'l J. Computer Vision, vol. 40, no. 2, pp. 99-121, 2000.
[30] P. Samarati, "Protecting Respondent's Privacy in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
[31] L. Sweeney, "Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 6, pp. 571-588, 2002.
[32] L. Sweeney, "$k$ -Anonymity: A Model for Protecting Privacy," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[33] T.M. Truta and B. Vinay, "Privacy Protection: P-Sensitive $k$ -Anonymity Property," Proc. Int'l Workshop Privacy Data Management (ICDE Workshops), 2006.
[34] A. Asuncion and D.J. Newman, UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearnML-Repository.html , 2007.
[35] M.P., Wand and M.C. Jones, Kernel Smoothing (Monographs on Statistics and Applied Probability). Chapman & Hall, 1995.
[36] K. Wang, B.C.M. Fung, and P.S. Yu, "Template-Based Privacy Preservation in Classification Problems," Proc. Int'l Conf. Data Mining (ICDM), pp. 466-473, 2005.
[37] R.C.-W. Wong, A.W.-C. Fu, K. Wang, and J. Pei, "Minimality Attack in Privacy Preserving Data Publishing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 543-554, 2007.
[38] R.C.-W. Wong, J. Li, A.W.-C. Fu, and K. Wang, "($\alpha$ , $k$ )-Anonymity: An Enhanced $k$ -Anonymity Model for Privacy Preserving Data Publishing," Proc. ACM SIGKDD, pp. 754-759, 2006.
[39] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 139-150, 2006.
[40] X. Xiao and Y. Tao, "Personalized Privacy Preservation," Proc. ACM SIGMOD, pp. 229-240, 2006.
[41] X. Xiao and Y. Tao, "$m$ -Invariance: Towards Privacy Preserving Republication of Dynamic Datasets," Proc. ACM SIGMOD, pp. 689-700, 2007.
[42] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A.W.-C. Fu, "Utility-Based Anonymization Using Local Recoding," Proc. ACM SIGKDD, pp. 785-790, 2006.

Index Terms:
Privacy preservation, data anonymization, data publishing, data security.
Citation:
Ninghui Li, Tiancheng Li, Suresh Venkatasubramanian, "Closeness: A New Privacy Measure for Data Publishing," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 7, pp. 943-956, July 2010, doi:10.1109/TKDE.2009.139
Usage of this product signifies your acceptance of the Terms of Use.