The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - Sept. (2012 vol.24)
pp: 1598-1612
Yaping Li , The Chinese University of Hong Kong, Hong Kong
Minghua Chen , The Chinese University of Hong Kong, Hong Kong
Qiwei Li , Rice University, Houston
Wei Zhang , The Chinese University of Hong Kong, Hong Kong
ABSTRACT
Privacy Preserving Data Mining (PPDM) addresses the problem of developing accurate models about aggregated data without access to precise information in individual data record. A widely studied perturbation-based PPDM approach introduces random perturbation to individual values to preserve privacy before data are published. Previous solutions of this approach are limited in their tacit assumption of single-level trust on data miners. In this work, we relax this assumption and expand the scope of perturbation-based PPDM to Multilevel Trust (MLT-PPDM). In our setting, the more trusted a data miner is, the less perturbed copy of the data it can access. Under this setting, a malicious data miner may have access to differently perturbed copies of the same data through various means, and may combine these diverse copies to jointly infer additional information about the original data that the data owner does not intend to release. Preventing such diversity attacks is the key challenge of providing MLT-PPDM services. We address this challenge by properly correlating perturbation across copies at different trust levels. We prove that our solution is robust against diversity attacks with respect to our privacy goal. That is, for data miners who have access to an arbitrary collection of the perturbed copies, our solution prevent them from jointly reconstructing the original data more accurately than the best effort using any individual copy in the collection. Our solution allows a data owner to generate perturbed copies of its data for arbitrary trust levels on-demand. This feature offers data owners maximum flexibility.
INDEX TERMS
Covariance matrix, Noise, Data privacy, Privacy, Estimation, Random variables, random perturbation, Privacy preserving data mining, multilevel trust
CITATION
Yaping Li, Minghua Chen, Qiwei Li, Wei Zhang, "Enabling Multilevel Trust in Privacy Preserving Data Mining", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 9, pp. 1598-1612, Sept. 2012, doi:10.1109/TKDE.2011.124
REFERENCES
[1] D. Agrawal and C.C. Aggarwal, "On the Design and Quantification of Privacy Preserving Data Mining Algorithms," Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '01), pp. 247-255, May 2001.
[2] R. Agrawal and R. Srikant, "Privacy Preserving Data Mining," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '00), 2000.
[3] K. Chen and L. Liu, "Privacy Preserving Data Classification with Rotation Perturbation," Proc. IEEE Fifth Int'l Conf. Data Mining, 2005.
[4] Z. Huang, W. Du, and B. Chen, "Deriving Private Information From Randomized Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2005.
[5] F. Li, J. Sun, S. Papadimitriou, G. Mihaila, and I. Stanoi, "Hiding in the Crowd: Privacy Preservation on Evolving Streams Through Correlation Tracking," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[6] K. Liu, H. Kargupta, and J. Ryan, "Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 92-106, Jan. 2006.
[7] S. Papadimitriou, F. Li, G. Kollios, and P.S. Yu, "Time Series Compressibility and Privacy," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), 2007.
[8] Y. Lindell and B. Pinkas, "Privacy Preserving Data Mining," Proc. Int'l Cryptology Conf. (CRYPTO), 2000.
[9] J. Vaidya and C.W. Clifton, "Privacy Preserving Association Rule Mining in Vertically Partitioned Data," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2002.
[10] O. Goldreich, "Secure Multi-Party Computation," Final (incomplete) draft, version 1.4, 2002.
[11] J. Vaidya and C. Clifton, "Privacy-Preserving K-Means Clustering over Vertically Partitioned Data," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.
[12] A.W.-C. Fu, R.C.-W. Wong, and K. Wang, "Privacy-Preserving Frequent Pattern Mining across Private Databases," Proc. IEEE Fifth Int'l Conf. Data Mining, 2005.
[13] B. Bhattacharjee, N. Abe, K. Goldman, B. Zadrozny, V.R. Chillakuru, M.del Carpio, and C. Apte, "Using Secure Coprocessors for Privacy Preserving Collaborative Data Mining and Analysis," Proc. Second Int'l Workshop Data Management on New Hardware (DaMoN '06), 2006.
[14] C.C. Aggarwal and P.S. Yu, "A Condensation Approach to Privacy Preserving Data Mining," Proc. Int'l Conf. Extending Database Technology (EDBT), 2004.
[15] E. Bertino, B.C. Ooi, Y. Yang, and R.H. Deng, "Privacy and Ownership Preserving of Outsourced Medical Data," Proc. 21st Int'l Conf. Data Eng. (ICDE), 2005.
[16] D. Kifer and J.E. Gehrke, "Injecting Utility Into Anonymized Datasets," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[17] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "L-Diversity: Privacy Beyond K-Anonymity," Proc. Int'l Conf. Data Eng., 2006.
[18] L. Sweeney, "K-Anonymity: A Model for Protecting Privacy," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), vol. 10, pp. 557-570, 2002.
[19] X. Xiao and Y. Tao, "Personalized Privacy Preservation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[20] R. Agrawal, R. Srikant, and D. Thomas, "Privacy Preserving OLAP," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[21] W. Du and Z. Zhan, "Using Randomized Response Techniques for Privacy-Preserving Data Mining," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.
[22] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, "Privacy Preserving Mining of Association Rules," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2002.
[23] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, "On the Privacy Preserving Properties of Random Data Perturbation Techniques," Proc. IEEE Third Int'l Conf. Data Mining, 2003.
[24] R. Agrawal, A. Evfimievski, and R. Srikant, "Information Sharing across Private Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[25] R. Agrawal, D. Asonov, M. Kantarcioglu, and Y. Li, "Sovereign Joins," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), 2006.
[26] C. Clifton, M. Kantarcioglu, X. Lin, J. Vaidya, and M. Zhu, "Tools for Privacy Preserving Distributed Data Mining," ACM SIGKDD Explorations, vol. 4, no. 2, pp. 28-34, 2003.
[27] B.A. Huberman, M. Franklin, and T. Hogg, "Enhancing Privacy and Trust in Electronic Communities," Proc. First ACM Conf. Electronic Commerce, pp. 78-86, Nov. 1999.
[28] M. Freedman, K. Nissim, and B. Pinkas, "Efficient Private Matching and Set Intersection," Advances in Cryptology—EUROCRYPT, vol. 3027, pp. 1-19, 2004.
[29] L. Kissner and D. Song, "Privacy-Preserving Set Operations," Proc. Int'l Cryptology Conf. (CRYPTO), 2005.
[30] A. Iliev and S. Smith, "More Efficient Secure Function Evaluation Using Tiny Trusted Third Parties," Technical Report TR2005-551, Dept. of Computer Science, Dartmouth Univ., 2005.
[31] J. Byun, Y. Sohn, E. Bertino, and N. Li, "Secure Anonymization for Incremental Datasets," Proc. Third VLDB Workshop Secure Data Management, 2006.
[32] X. Xiao and Y. Tao, "M-Invariance: Towards Privacy Preserving Re-Publication of Dynamic Datasets," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[33] B. Fung, K. Wang, A. Fu, and J. Pei, "Anonymity for Continuous Data Publishing," Proc. Int'l Conf. Extending Database Technology (EDBT), 2008.
[34] G. Wang, Z. Zhu, W. Du, and Z. Teng, "Inference Analysis in Privacy-Preserving Data Re-Publishing," Proc. Int'l Conf. Data Mining, 2008.
[35] Y. Li and M. Chen, "Enabling Multi-Level Trust in Privacy Preserving Data Mining," Technical Report UCB/EECS-2008-156, EECS Dept., Univ. of California, Berkeley, http://www.eecs. berkeley.edu/Pubs/TechRpts/ 2008EECS-2008-156.html, Dec. 2008.
[36] X. Xiao, Y. Tao, and M. Chen, "Optimal Random Perturbation at Multiple Privacy Levels," Proc. Int'l Conf. Very Large Data Bases, 2009.
[37] A. Evfimievski, J. Gehrke, and R. Srikant, "Limiting Privacy Breaches in Privacy Preserving Data Mining," Proc. ACM Symp. Principles of Database Systems, 2003.
[38] C. Aggarwal, "Privacy and the Dimensionality Curse," Privacy-Preserving Data Mining, pp. 433-460, 2008.
[39] K. Shanmugan and A. Breipohl, Random Signals: Detection, Estimation, and Data Analysis. John Wiley & Sons Inc, 1988.
[40] J. Brewer, "Kronecker Products and Matrix Calculus in System Theory," IEEE Trans. Circuits and Systems, vol. 25, no. 9, pp. 772-781, Sept. 1978.
[41] "MPC Data Projects," http:/www.ipums.org, 2012.
[42] X. Xiao and Y. Tao, "Output Perturbation with Query Relaxation," Proc. Int'l Conf. Very Large Data Bases, 2008.
[43] G. Golub and C. Van Loan, Matrix Computations. The Johns Hopkins Univ. Press, 1996.
[44] "Multivariate Normal Distribution," http://en.wikipedia.org/wikiMultivariate_normal_distribution , 2012.
[45] D. Knuth, The Art of Computer Programming: Seminumerical Algorithms, vol. 2, ch. 3, Addison-Wesley, 1981.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool