Subscribe
Issue No.02 - February (2009 vol.21)
pp: 206-219
Aristides Gionis , Yahoo Research, Barcelona
Tamir Tassa , The Open University of Israel, Ra'anana
ABSTRACT
The technique of k-anonymization allows the releasing of databases that contain personal information while ensuring some degree of individual privacy. Anonymization is usually performed by generalizing database entries. We formally study the concept of generalization, and propose three information-theoretic measures for capturing the amount of information that is lost during the anonymization process. The proposed measures are more general and more accurate than those that were proposed by Meyerson and Williams [23] and Aggarwal et al. [1]. We study the problem of achieving k-anonymity with minimal loss of information. We prove that it is NP-hard and study polynomial approximations for the optimal solution. Our first algorithm gives an approximation guarantee of O(\ln k) for two of our measures as well as for the previously studied measures. This improves the best-known O(k)-approximation in [1]. While the previous approximation algorithms relied on the graph representation framework, our algorithm relies on a novel hypergraph representation that enables the improvement in the approximation ratio from O(k) to O(\ln k). As the running time of the algorithm is O(n^{2k}), we also show how to adapt the algorithm in [1] in order to obtain an O(k)-approximation algorithm that is polynomial in both n and k.
INDEX TERMS
Privacy-preserving data mining, k-anonymization, approximation algorithms for NP-hard problems.
CITATION
Aristides Gionis, Tamir Tassa, "k-Anonymization with Minimal Loss of Information", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 2, pp. 206-219, February 2009, doi:10.1109/TKDE.2008.129
REFERENCES
 [1] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Approximation Algorithms for k-Anonymity,” Proc. 10th Int'l Conf. Database Theory (ICDT), 2005. [2] G. Aggarwal, N. Mishra, and B. Pinkas, “Secure Computation of the $k{\rm th}$ -Ranked Element,” Proc. Advances in Cryptology Int'l Conf. Theory and Applications of Cryptographic Techniques (EUROCRYPT), 2004. [3] D. Agrawal and C. Aggarwal, “On the Design and Quantification of Privacy Preserving Data Mining Algorithms,” Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2001. [4] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. 19th ACM SIGMOD Int'l Conf. Management of Data (ICMD), 2000. [5] R. Agrawal, R. Srikant, and D. Thomas, “Privacy Preserving OLAP,” Proc. 25th ACM SIGMOD Int'l Conf. Management of Data (ICMD), 2005. [6] A. Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical Privacy: The SuLQ Framework,” Proc. 24th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2005. [7] S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee, “Toward Privacy in Public Databases,” Proc. Second Theory of Cryptography Conf. (TCC), 2005. [8] V. Chvatal, “A Greedy Heuristic for the Set-Covering Problem,” Math. of Operations Research, vol. 4, no. 3, pp. 233-235, 1979. [9] T. Dalenius, “Towards a Methodology for Statistical Disclosure Control,” Statistik Tidskrift, vol. 15, pp. 429-444, 1977. [10] A.G. DeWaal and L.C.R.J. Willenborg, “Information Loss through Global Recoding and Local Suppression,” Netherlands Official Statistics, special issue on SDC, vol. 14, pp. 17-20, 1999. [11] I. Dinur and K. Nissim, “Revealing Information while Preserving Privacy,” Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2003. [12] C. Dwork, “Differential Privacy,” Proc. 33rd Int'l Colloquium Automata, Languages and Programming (ICALP '06), part II, pp. 1-12, 2006. [13] C. Dwork and K. Nissim, “Privacy-Preserving Data Mining on Vertically Partitioned Databases,” Proc. Advances in Cryptology Int'l Conf. Theory and Applications of Cryptographic Techniques (EUROCRYPT), 2004. [14] A. Evfimievski, J. Gehrke, and R. Srikant, “Limiting Privacy Breaches in Privacy Preserving Data Mining,” Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2003. [15] M. Freedman, K. Nissim, and B. Pinkas, “Efficient Private Matching and Set Intersection,” Proc. Advances in Cryptology Int'l Conf. Theory and Applications of Cryptographic Techniques (EUROCRYPT), 2004. [16] J. Goldberger and T. Tassa, Learning-Enhancing Methods for Anonymization of Tables, submitted for publication, 2008. [17] O. Goldreich, S. Micali, and A. Wigderson, “How to Play Any Mental Game or a Completeness Theorem for Protocols with Honest Majority,” Proc. 19th Ann. ACM Symp. Theory of Computing (STOC), 1987. [18] S. Goldwasser and S. Micali, “Probabilistic Encryption,” J.Computer and System Sciences, vol. 28, pp. 270-299, 1984. [19] D.S. Johnson, “Approximation Algorithms for Combinatorial Problems,” J. Computer and System Sciences, vol. 9, pp. 256-278, 1974. [20] K. Kenthapadi, N. Mishra, and K. Nissim, “Simulatable Auditing,” Proc. 24th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2005. [21] J. Kleinberg, C. Papadimitriou, and P. Raghavan, “Auditing Boolean Attributes,” J. Computer and System Sciences, vol. 6, pp. 244-253, 2003. [22] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining,” J.Cryptology, vol. 15, no. 3, pp. 177-206, 2002. [23] A. Meyerson and R. Williams, “On the Complexity of Optimal $k$ -Anonymity,” Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2004. [24] P. Samarati, “Protecting Respondent's Privacy in Microdata Release,” IEEE Trans. Knowledge and Data Eng., vol. 13, pp. 1010-1027, 2001. [25] P. Samarati and L. Sweeney, “Generalizing Data to Provide Anonymity when Disclosing Information (Abstract),” Proc. 17th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 1998. [26] L. Sweeney, “$k$ -Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002. [27] L. Willenborg and T. DeWaal, Elements of Statistical Disclosure Control. Springer-Verlag, 2001. [28] A. Yao, “How to Generate and Exchange Secrets,” Proc. 27th IEEE Symp. Foundations of Computer Science (FOCS), 1986.