This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
ANGEL: Enhancing the Utility of Generalization for Privacy Preserving Publication
July 2009 (vol. 21 no. 7)
pp. 1073-1087
Yufei Tao, Chinese University of Hong Kong, Hong Kong
Hekang Chen, Fudan University, Shanghai
Xiaokui Xiao, Chinese University of Hong Kong, Hong Kong
Shuigeng Zhou, Fudan University, Shanghai
Donghui Zhang, Northeastern University, Boston
Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL,1 a new anonymization technique that is as effective as generalization in privacy protection, but is able to retain significantly more information in the microdata. ANGEL is applicable to any monotonic principles (e.g., l-diversity, t-closeness, etc.), with its superiority (in correlation preservation) especially obvious when tight privacy control must be enforced. We show that ANGEL lends itself elegantly to the hard problem of marginal publication. In particular, unlike generalization that can release only restricted marginals, our technique can be easily used to publish any marginals with strong privacy guarantees.

[1] C.C. Aggarwal, “On k-Anonymity and the Curse of Dimensionality,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.
[2] C.C. Aggarwal and P.S. Yu, “A Condensation Approach to Privacy Preserving Data Mining,” Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 183-199, 2004.
[3] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, “Achieving Anonymity via Clustering,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 153-162, 2006.
[4] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Anonymizing Tables,” Proc. Int'l Conf. Database Theory (ICDT), pp. 246-258, 2005.
[5] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, “Hippocratic Databases,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 143-154, 2002.
[6] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 439-450, 2000.
[7] R. Bayardo and R. Agrawal, “Data Privacy through Optimal k-Anonymization,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.
[8] E. Bertino, C. Bettini, E. Ferrari, and P. Samarati, “An Access Control Model Supporting Periodicity Constraints and Temporal Reasoning,” ACM Trans. Database Systems (TODS), vol. 23, no. 3 pp. 231-285, 1998.
[9] E. Bertino and E. Ferrari, “Secure and Selective Dissemination of XML Documents,” ACM Trans. Information and System Security, vol. 5, no. 3, pp. 290-331, 2002.
[10] A. Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical Privacy: The Sulq Framework,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 128-138, 2005.
[11] B.-C. Chen, R. Ramakrishnan, and K. LeFevre, “Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007.
[12] Y. Du, T. Xia, Y. Tao, D. Zhang, and F. Zhu, “On Multidimensional $k$ -Anonymity with Local Recoding Generalization,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 1422-1424, 2007.
[13] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating Noise to Sensitivity in Private Data Analysis,” Proc. Theory of Cryptography Conf. (TCC), pp. 265-284, 2006.
[14] A.V. Evfimievski, J. Gehrke, and R. Srikant, “Limiting Privacy Breaches in Privacy Preserving Data Mining,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 211-222, 2003.
[15] B.C.M. Fung, K. Wang, and P.S. Yu, “Top-Down Specialization for Information and Privacy Preservation,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 205-216, 2005.
[16] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, “Fast Data Anonymization with Low Information Loss,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 758-769, 2007.
[17] T. Iwuchukwu and J.F. Naughton, “k-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp.746-757, 2007.
[18] V. Iyengar, “Transforming Data to Satisfy Privacy Constraints,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 279-288, 2002.
[19] W. Jiang and C. Clifton, “A Secure Distributed Framework for Achieving k-Anonymity,” The VLDB J., vol. 15, no. 4, pp. 316-333, 2006.
[20] D. Kifer and J. Gehrke, “Injecting Utility into Anonymized Data sets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 217-228, 2006.
[21] K. LeFevre, D. DeWitt, and R. Ramakrishnan, “Workload-Aware Anonymization,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2006.
[22] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full-Domain $k$ -Anonymity,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 49-60, 2005.
[23] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Mondrian Multidimensional $k$ -Anonymity,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 277-286, 2006.
[24] N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: Privacy beyond k-Anonymity and l-Diversity,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.
[25] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “l-Diversity: Privacy beyond k-Anonymity,” Proc. Int'l Conf. Data Eng. (ICDE), 2006.
[26] D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern, “Worst-Case Background Knowledge in Privacy,” Proc. Int'l Conf. Data Eng. (ICDE), 2007.
[27] A. Meyerson and R. Williams, “On the Complexity of Optimal k-Anonymity,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 223-228, 2004.
[28] S.U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani, “Towards Robustness in Query Auditing,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 151-162, 2006.
[29] M.E. Nergiz, M. Atzori, and C. Clifton, “Hiding the Presence of Individuals from Shared Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 665-676, 2007.
[30] H. Park and K. Shim, “Approximate Algorithms for k-Anonymity,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 67-78, 2007.
[31] V. Rastogi, S. Hong, and D. Suciu, “The Boundary between Privacy and Utility in Data Publishing,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 531-542, 2007.
[32] J. Rothe, “Some Facets of Complexity Theory and Cryptography: A Five-Lecture Tutorial,” ACM Computing Surveys, vol. 34, no. 4 pp. 504-549, 2002.
[33] P. Samarati, “Protecting Respondents' Identities in Microdata Release,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp.1010-1027, Nov./Dec. 2001.
[34] L. Sweeney, “Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression,” Int'l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 571-588, 2002.
[35] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[36] J. Vaidya and C. Clifton, “Privacy-Preserving $k$ -Means Clustering over Vertically Partitioned Data,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 206-215, 2003.
[37] K. Wang and B.C.M. Fung, “Anonymizing Sequential Releases,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 414-423, 2006.
[38] R.C.-W. Wong, A.W.-C. Fu, K. Wang, and J. Pei, “Minimality Attack in Privacy Preserving Data Publishing,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 543-554, 2007.
[39] R.C.-W. Wong, J. Li, A.W.-C. Fu, and K. Wang, “(Alpha, k)-Anonymity: An Enhanced k-Anonymity Model for Privacy Preserving Data Publishing,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 754-759, 2006.
[40] X. Xiao and Y. Tao, “Anatomy: Simple and Effective Privacy Preservation,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp.139-150, 2006.
[41] X. Xiao and Y. Tao, “Personalized Privacy Preservation,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 229-240, 2006.
[42] X. Xiao and Y. Tao, “$m$ -Invariance: Towards Privacy Preserving Re-Publication of Dynamic Data Sets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 689-700, 2007.
[43] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A.W.-C. Fu, “Utility-Based Anonymization Using Local Recoding,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 785-790, 2006.
[44] C. Yao, X.S. Wang, and S. Jajodia, “Checking for $k$ -Anonymity Violation by Views,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 910-921, 2005.
[45] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, “Aggregate Query Answering on Anonymized Tables,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 116-125, 2007.

Index Terms:
Privacy, generalization, ANGEL.
Citation:
Yufei Tao, Hekang Chen, Xiaokui Xiao, Shuigeng Zhou, Donghui Zhang, "ANGEL: Enhancing the Utility of Generalization for Privacy Preserving Publication," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 7, pp. 1073-1087, July 2009, doi:10.1109/TKDE.2009.65
Usage of this product signifies your acceptance of the Terms of Use.