Subscribe

Issue No.07 - July (2010 vol.22)

pp: 943-956

Tiancheng Li , Purdue University, West Lafayette

Ninghui Li , Purdue University, West Lafayette

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.139

ABSTRACT

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of \ell-diversity has been proposed to address this; \ell-diversity requires that each equivalence class has at least \ell well-represented (in Section 2) values for each sensitive attribute. In this paper, we show that \ell-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called “closeness.” We first present the base model t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called (n,t)-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.

INDEX TERMS

Privacy preservation, data anonymization, data publishing, data security.

CITATION

Tiancheng Li, Ninghui Li, "Closeness: A New Privacy Measure for Data Publishing",

*IEEE Transactions on Knowledge & Data Engineering*, vol.22, no. 7, pp. 943-956, July 2010, doi:10.1109/TKDE.2009.139REFERENCES

- [1] C. Aggarwal, "On $k$ -Anonymity and the Curse of Dimensionality,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.- [2] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering,"
Proc. ACM Symp. Principles of Database Systems (PODS), pp. 153-162, 2006.- [3] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin,
Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., 1993.- [4] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal $k$ -Anonymization,"
Proc. Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.- [5] F. Bacchus, A. Grove, J.Y. Halpern, and D. Koller, "From Statistics to Beliefs,"
Proc. Nat'l Conf. Artificial Intelligence (AAAI), pp. 602-608, 1992.- [6] J.-W. Byun, Y. Sohn, E. Bertino, and N. Li, "Secure Anonymization for Incremental Datasets,"
Proc. VLDB Workshop Secure Data Management (SDM), pp. 48-63, 2006.- [7] B.-C. Chen, K. LeFevre, and R. Ramakrishnan, "Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007.- [8] G.T. Duncan and D. Lambert, "Disclosure-Limited Data Dissemination,"
J. Am. Statistical Assoc., vol. 81, pp. 10-28, 1986.- [9] B.C.M. Fung, K. Wang, and P.S. Yu, "Top-Down Specialization for Information and Privacy Preservation,"
Proc. Int'l Conf. Data Eng. (ICDE), pp. 205-216, 2005.- [10] C.R. Givens and R.M. Shortt, "A Class of Wasserstein Metrics for Probability Distributions,"
Michigan Math J., vol. 31, pp. 231-240, 1984.- [11] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints,"
Proc. ACM SIGKDD, pp. 279-288, 2002.- [12] D. Kifer and J. Gehrke, "Injecting Utility into Anonymized Datasets,"
Proc. ACM SIGMOD, pp. 217-228, 2006.- [13] N. Koudas, D. Srivastava, T. Yu, and Q. Zhang, "Aggregate Query Answering on Anonymized Tables,"
Proc. Int'l Conf. Data Eng. (ICDE), pp. 116-125, 2007.- [14] S.L. Kullback and R.A. Leibler, "On Information and Sufficiency,"
Annals of Math. Statistics, vol. 22, pp. 79-86, 1951.- [15] D. Lambert, "Measures of Disclosure Risk and Harm,"
J. Official Statistics, vol. 9, pp. 313-331, 1993.- [16] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain $k$ -Anonymity,"
Proc. ACM SIGMOD, pp. 49-60, 2005.- [17] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional $k$ -Anonymity,"
Proc. Int'l Conf. Data Eng. (ICDE), p. 25, 2006.- [18] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Workload-Aware Anonymization,"
Proc. ACM SIGKDD, pp. 277-286, 2006.- [19] N. Li, T. Li, and S. Venkatasubramanian, "$t$ -Closeness: Privacy beyond $k$ -Anonymity and $\ell$ -Diversity,"
Proc. Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.- [20] T. Li and N. Li, "Injector: Mining Background Knowledge for Data Anonymization,"
Proc. Int'l Conf. Data Eng. (ICDE), 2008.- [21] T. Li and N. Li, "Towards Optimal $k$ -Anonymization,"
Data and Knowledge Eng., vol. 65, pp. 22-39, 2008.- [22] T. Li, N. Li, and J. Zhang, "Modeling and Integrating Background Knowledge in Data Anonymization,"
Proc. Int'l Conf. Data Eng. (ICDE), 2009.- [23] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy Beyond $k$ -Anonymity,"
Proc. Int'l Conf. Data Eng. (ICDE), p. 24, 2006.- [24] D.J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J.Y. Halpern, "Worst-Case Background Knowledge for Privacy-Preserving Data Publishing,"
Proc. Int'l Conf. Data Eng. (ICDE), pp. 126-135, 2007.- [25] A. Meyerson and R. Williams, "On the Complexity of Optimal $k$ -Anonymity,"
Proc. ACM Symp. Principles of Database Systems (PODS), pp. 223-228, 2004.- [26] M.E. Nergiz, M. Atzori, and C. Clifton, "Hiding the Presence of Individuals from Shared Databases,"
Proc. ACM SIGMOD, pp. 665-676, 2007.- [27] H. Park and K. Shim, "Approximate Algorithms for $k$ -Anonymity,"
Proc. ACM SIGMOD, pp. 67-78, 2007.- [28] V. Rastogi, S. Hong, and D. Suciu, "The Boundary between Privacy and Utility in Data Publishing,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 531-542, 2007.- [29] Y. Rubner, C. Tomasi, and L.J. Guibas, "The Earth Mover's Distance as a Metric for Image Retrieval,"
Int'l J. Computer Vision, vol. 40, no. 2, pp. 99-121, 2000.- [30] P. Samarati, "Protecting Respondent's Privacy in Microdata Release,"
IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.- [31] L. Sweeney, "Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression,"
Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 6, pp. 571-588, 2002.- [32] L. Sweeney, "$k$ -Anonymity: A Model for Protecting Privacy,"
Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.- [33] T.M. Truta and B. Vinay, "Privacy Protection: P-Sensitive $k$ -Anonymity Property,"
Proc. Int'l Workshop Privacy Data Management (ICDE Workshops), 2006.- [34] A. Asuncion and D.J. Newman, UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearnML-Repository.html , 2007.
- [35] M.P., Wand and M.C. Jones,
Kernel Smoothing (Monographs on Statistics and Applied Probability). Chapman & Hall, 1995.- [36] K. Wang, B.C.M. Fung, and P.S. Yu, "Template-Based Privacy Preservation in Classification Problems,"
Proc. Int'l Conf. Data Mining (ICDM), pp. 466-473, 2005.- [37] R.C.-W. Wong, A.W.-C. Fu, K. Wang, and J. Pei, "Minimality Attack in Privacy Preserving Data Publishing,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 543-554, 2007.- [38] R.C.-W. Wong, J. Li, A.W.-C. Fu, and K. Wang, "($\alpha$ , $k$ )-Anonymity: An Enhanced $k$ -Anonymity Model for Privacy Preserving Data Publishing,"
Proc. ACM SIGKDD, pp. 754-759, 2006.- [39] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation,"
Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 139-150, 2006.- [40] X. Xiao and Y. Tao, "Personalized Privacy Preservation,"
Proc. ACM SIGMOD, pp. 229-240, 2006.- [41] X. Xiao and Y. Tao, "$m$ -Invariance: Towards Privacy Preserving Republication of Dynamic Datasets,"
Proc. ACM SIGMOD, pp. 689-700, 2007.- [42] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A.W.-C. Fu, "Utility-Based Anonymization Using Local Recoding,"
Proc. ACM SIGKDD, pp. 785-790, 2006. |