The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2008 vol.20)
pp: 1181-1194
Raymond Chi-Wing Wong , the Chinese University of Hong Kong, Hong Kong
Ada Wai-Chee Fu , The Chinese University of Hong Kong, Hong Kong
Jian Pei , Simon Fraser Univeristy, Burnaby
ABSTRACT
Individual privacy will be at risk if a published data set is not properly de-identified. k-anonymity is a major technique to de-identify a data set. Among a number of k-anonymisation schemes, local recoding methods are promising for minimising the distortion of a k-anonymity view. This paper addresses two major issues in local recoding k-anonymisation in attribute hierarchical taxonomies. Firstly, we define a proper distance metric to achieve local recoding generalisation with small distortion. Secondly, we propose a means to control the inconsistency of attribute domains in a generalised view by local recoding. We show experimentally that our proposed local recoding method based on the proposed distance metric produces higher quality k-anonymity tables in three quality measures than a global recoding anonymisation method, Incognito, and a multidimensional recoding anonymisation method, Multi. The proposed inconsistency handling method is able to balance distortion and consistency of a generalised view.
INDEX TERMS
Security and Privacy Protection, Data mining
CITATION
Raymond Chi-Wing Wong, Ada Wai-Chee Fu, Jian Pei, "Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 9, pp. 1181-1194, September 2008, doi:10.1109/TKDE.2008.52
REFERENCES
[1] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Anonymizing Tables,” Proc. 10th Int'l Conf. Database Theory (ICDT '05), pp. 246-258, 2005.
[2] D. Agrawal and C.C. Aggarwal, “On the Design and Quantification of Privacy Preserving Data Mining Algorithms,” Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '01), pp. 247-255, 2001.
[3] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. 19th ACM SIGMOD '00, pp. 439-450, May 2000.
[4] R.J. Bayardo and R. Agrawal, “Data Privacy through Optimal $k\hbox{-}{\rm Anonymization}$ ,” Proc. 21st Int'l Conf. Data Eng. (ICDE '05), pp.217-228, 2005.
[5] J. Domingo-Ferrer and V. Torra, “Ordinal, Continuous and Heterogeneous $k\hbox{-}{\rm Anonymity}$ through Microaggregation,” Data Mining and Knowledge Discovery, vol. 11, no. 2, pp. 195-212, 2005.
[6] Y. Du, T. Xia, Y. Tao, D. Zhang, and F. Zhu, “On Multidimensional $k\hbox{-}{\rm Anonymity}$ with Local Recoding Generalization,” Proc. 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 1422-1424, 2007.
[7] B.C.M. Fung, K. Wang, and P.S. Yu, “Top-Down Specialization for Information and Privacy Preservation,” Proc. 21st Int'l Conf. Data Eng. (ICDE '05), pp. 205-216, 2005.
[8] V.S. Iyengar, “Transforming Data to Satisfy Privacy Constraints,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 279-288, 2002.
[9] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full-Domain $k\hbox{-}{\rm Anonymity}$ ,” Proc. 24th ACM SIGMOD '05, pp. 49-60, 2005.
[10] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Mondrian Multidimensional $k\hbox{-}{\rm Anonymity}$ ,” Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), p. 25, 2006.
[11] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Workload-Aware Anonymization,” Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 277-286, 2006.
[12] K. Leonard and R. Peter, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience Publication, 1990.
[13] J. Li, R.C.-W. Wong, A.W.-C. Fu, and J. Pei, “Achieving $k\hbox{-}{\rm Anonymity}$ by Clustering in Attribute Hierarchical Structures,” Proc. Eighth Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK '06), pp. 405-416, 2006.
[14] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining,” J.Cryptology, vol. 15, no. 3, pp. 177-206, 2002.
[15] A. Meyerson and R. Williams, “On the Complexity of Optimal $k\hbox{-}{\rm Anonymity}$ ,” Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '04), pp. 223-228, 2004.
[16] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” http://www.ics. uci.edu/~mlearnMLRepository.html , 1998.
[17] S. Rizvi and J. Haritsa, “Maintaining Data Privacy in Association Rule Mining,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 682-693, 2002.
[18] P. Samarati, “Protecting Respondents' Identities in Microdata Release,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp.1010-1027, Nov./Dec. 2001.
[19] L. Sweeney, “Achieving $k\hbox{-}{\rm Anonymity}$ Privacy Protection Using Generalization and Suppression,” Int'l J. Uncertainty, Fuzziness and Knowledge Based Systems, vol. 10, no. 5, pp. 571-588, 2002.
[20] L. Sweeney, “$k\hbox{-}{\rm Anonymity}$ : A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness and Knowledge Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[21] J. Vaidya and C. Clifton, “Privacy-Preserving $k\hbox{-}{\rm Means}$ Clustering over Vertically Partitioned Data,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 206-215, 2003.
[22] K. Wang, P.S. Yu, and S. Chakraborty, “Bottom-Up Generalization: A Data Mining Solution to Privacy Protection,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 249-256, 2004.
[23] R. Wright and Z. Yang, “Privacy-Preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 713-718, 2004.
[24] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A.W.-C. Fu, “Utility-Based Anonymization Using Local Recoding,” Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 785-790, 2006.
71 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool