The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.22)
pp: 392-403
Dimitris Sacharidis , Institute for the Management of Information, Athens and Hong Kong University of Science and Technology, Hong Kong
Kyriakos Mouratidis , Singapore Management University, Singapore
Dimitris Papadias , Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (PD) that can be used to breach privacy, none utilizes PD during the anonymization process. Specifically, existing generalization algorithms create anonymous tables using only the microdata table (MT) to be published, independently of the external knowledge available. This omission leads to high information loss. Motivated by this observation, we first introduce the concept of k-join-anonymity (KJA), which permits more effective generalization to reduce the information loss. Briefly, KJA anonymizes a superset of MT, which includes selected records from PD. We propose two methodologies for adapting k-anonymity algorithms to their KJA counterparts. The first generalizes the combination of MT and PD, under the constraint that each group should contain at least 1 tuple of MT (otherwise, the group is useless and discarded). The second anonymizes MT, and then, refines the resulting groups using PD. Finally, we evaluate the effectiveness of our contributions with an extensive experimental evaluation using real and synthetic data sets.
INDEX TERMS
Privacy, k-anonymity.
CITATION
Dimitris Sacharidis, Kyriakos Mouratidis, Dimitris Papadias, "k-Anonymity in the Presence of External Databases", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 3, pp. 392-403, March 2010, doi:10.1109/TKDE.2009.120
REFERENCES
[1] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[2] P. Samarati, “Protecting Respondents' Identities in Microdata Release,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp.1010-1027, Nov./Dec. 2001.
[3] C. Bettini, X.S. Wang, and S. Jajodia, “The Role of Quasi-Identifiers in k-Anonymity Revisited,” Technical Report abs/cs/0611035, Computing Research Repository (CoRR), 2006.
[4] R.J. Bayardo,Jr., and R. Agrawal, “Data Privacy through Optimal k-Anonymization,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp.217-228, 2005.
[5] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A.W.-C. Fu, “Utility-Based Anonymization Using Local Recoding,” Proc. ACM SIGKDD, pp. 785-790, 2006.
[6] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full-Domain k-Anonymity,” Proc. ACM SIGMOD, pp.49-60, 2005.
[7] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Mondrian Multidimensional k-Anonymity,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), p. 25, 2006.
[8] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, “Achieving Anonymity via Clustering,” Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS), pp. 153-162, 2006.
[9] A. Meyerson and R. Williams, “On the Complexity of Optimal k-Anonymity,” Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS), pp. 223-228, 2004.
[10] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Anonymizing Tables,” Proc. Int'l Conf. Database Theory (ICDT), pp. 246-258, 2005.
[11] C.C. Aggarwal, “On k-Anonymity and the Curse of Dimensionality,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.
[12] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications, second ed. Springer-Verlag, 2000.
[13] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “l-Diversity: Privacy beyond k-Anonymity,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), p. 24, 2006.
[14] X. Xiao and Y. Tao, “Anatomy: Simple and Effective Privacy Preservation,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp.139-150, 2006.
[15] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, “Aggregate Query Answering on Anonymized Tables,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 116-125, 2007.
[16] J.-W. Byun, Y. Sohn, E. Bertino, and N. Li, “Secure Anonymization for Incremental Data Sets,” Proc. VLDB Workshop Secure Data Management (SDM), pp. 48-63, 2006.
[17] X. Xiao and Y. Tao, “m-Invariance: Towards Privacy Preserving Re-Publication of Dynamic Data Sets,” Proc. ACM SIGMOD, pp.689-700, 2007.
[18] N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: Privacy beyond k-Anonymity and l-Diversity,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.
[19] R.C.-W. Wong, A.W.-C. Fu, K. Wang, and J. Pei, “Minimality Attack in Privacy Preserving Data Publishing,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 543-554, 2007.
[20] D.J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J.Y. Halpern, “Worst-Case Background Knowledge for Privacy-Preserving Data Publishing,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 126-135, 2007.
[21] V. Rastogi, S. Hong, and D. Suciu, “The Boundary between Privacy and Utility in Data Publishing,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 531-542, 2007.
[22] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, “Fast Data Anonymization with Low Information Loss,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 758-769, 2007.
[23] M. Terrovitis, N. Mamoulis, and P. Kalnis, “Privacy-Preserving Anonymization of Set-Valued Data,” Proc. VLDB Endowment, vol. 1, no. 1, pp. 115-125, 2008.
[24] M.E. Nergiz, M. Atzori, and C. Clifton, “Hiding the Presence of Individuals from Shared Databases,” Proc. ACM SIGMOD, pp.665-676, 2007.
[25] F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin, “Dynamic Authenticated Index Structures for Outsourced Databases,” Proc. ACM SIGMOD, pp. 121-132, 2006.
[26] K. Mouratidis, D. Sacharidis, and H.-H. Pang, “Partially Materialized Digest Scheme: An Efficient Verification Method for Outsourced Databases,” Int'l J. Very Large Data Bases, vol. 18, no. 1, pp. 363-381, 2009.
[27] S. Ruggles, M. Sobek, T. Alexander, C.A. Fitch, R. Goeken, P.K. Hall, M. King, and C. Ronnander, Integrated Public Use Microdata Series: Version 4.0 [Machine-Readable Database]. Minnesota Population Center [Producer and Distributor], 2008.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool