This Article 
 Bibliographic References 
 Add to: 
Privacy-Preserving Multiparty Collaborative Mining with Geometric Data Perturbation
December 2009 (vol. 20 no. 12)
pp. 1764-1776
Keke Chen, Wright State University, Dayton
Ling Liu, Georgia Institute of Technology, Atlanta
In multiparty collaborative data mining, participants contribute their own data sets and hope to collaboratively mine a comprehensive model based on the pooled data set. How to efficiently mine a quality model without breaching each party's privacy is the major challenge. In this paper, we propose an approach based on geometric data perturbation and data mining service-oriented framework. The key problem of applying geometric data perturbation in multiparty collaborative mining is to securely unify multiple geometric perturbations that are preferred by different parties, respectively. We have developed three protocols for perturbation unification. Our approach has three unique features compared to the existing approaches: 1) with geometric data perturbation, these protocols can work for many existing popular data mining algorithms, while most of other approaches are only designed for a particular mining algorithm; 2) both the two major factors: data utility and privacy guarantee are well preserved, compared to other perturbation-based approaches; and 3) two of the three proposed protocols also have great scalability in terms of the number of participants, while many existing cryptographic approaches consider only two or a few more participants. We also study different features of the three protocols and show the advantages of different protocols in experiments.

[1] C.C. Aggarwal and P.S. Yu, “A Condensation Approach to Privacy Preserving Data Mining,” Proc. Int'l Conf. Extending Database Technology (EDBT), vol. 2992, pp. 183-199, 2004.
[2] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. ACM SIGMOD, 2000.
[3] Y. Amir, Y. Kim, C. Nita-rotaru, and G. Tsudik, “On the Performance of Group Key Agreement Protocols,” ACM Trans. Information and System Security, vol. 7, no. 3, pp. 457-488, Aug. 2004.
[4] D. Beaver, “Commodity-Based Cryptography,” Proc. ACM Symp. Theory of Computing, 1997.
[5] K. Chen and L. Liu, “A Random Rotation Perturbation Approach to Privacy Preserving Data Classification,” Proc. Int'l Conf. Data Mining (ICDM), 2005.
[6] K. Chen and L. Liu, “Space Adaptation: Privacy-Preserving Multiparty Collaborative Mining with Geometric Perturbation,” Proc. IEEE Conf. Principles on Distributed Computing, 2007.
[7] K. Chen and L. Liu, “Towards Attack-Resilient Geometric Data Perturbation,” Proc. SIAM Data Mining Conf., 2007.
[8] A. Evfimievski, J. Gehrke, and R. Srikant, “Limiting Privacy Breaches in Privacy Preserving Data Mining,” Proc. ACM Conf. Principles of Database Systems (PODS), 2003.
[9] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association Rules,” Proc. ACM SIGKDD, 2002.
[10] J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. Strauss, and R.N. Wright, “Secure Multiparty Computation of Approximations,” Proc. 28th Int'l Colloquium on Automata, Languages and Programming (ICALP '01), pp. 927-938, 2001.
[11] T. Hastie, R. Tibshirani, and J. Friedmann, The Elements of Statistical Learning. Springer-Verlag, 2001.
[12] Z. Huang, W. Du, and B. Chen, “Deriving Private Information from Randomized Data,” Proc. ACM SIGMOD, 2005.
[13] G. Jagannathan and R.N. Wright, “Privacy-Preserving Distributed K-Means Clustering over Arbitrarily Partitioned Data,” Proc. ACM SIGKDD, 2005.
[14] A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, pp. 264-323, 1999.
[15] K. Kantarcioglu and C. Clifton, “Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1026-1037, Sept. 2004.
[16] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “On the Privacy Preserving Properties of Random Data Perturbation Techniques,” Proc. Int'l Conf. Data Mining (ICDM), 2003.
[17] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining,” J.Cryptology, vol. 15, no. 3, pp. 177-206, 2000.
[18] K. Liu, H. Kargupta, and J. Ryan, “Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 92-106, Jan. 2006.
[19] C.D. Meyer, Matrix Analysis and Applied Linear Algebra. Soc. for Industrial and Applied Math., 2000.
[20] T. Mitchell, Machine Learning. McGraw Hill, 1997.
[21] L. Sadun, Applied Linear Algebra: The Decoupling Principle. Prentice-Hall, 2001.
[22] L. Sweeney, “K-Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[23] J. Vaidya and C. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” Proc. ACM SIGKDD, 2002.
[24] J. Vaidya and C. Clifton, “Privacy Preserving K-Means Clustering over Vertically Partitioned Data,” Proc. ACM SIGKDD, 2003.
[25] A.C. Yao, “How to Generate and Exhange Secrets,” Proc. IEEE Symp. Foundations of Computer Science, 1986.
[26] H. Yu, J. Vaidya, and X. Jiang, “Privacy-Preserving SVM Classification on Vertically Partitioned Data,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2006.
[27] N. Zhang, S. Wang, and W. Zhao, “A New Scheme on Privacy-Preserving Data Classification,” Proc. ACM SIGKDD, 2005.

Index Terms:
Privacy-preserving data mining, distributed computing, collaborative computing, geometric data perturbation.
Keke Chen, Ling Liu, "Privacy-Preserving Multiparty Collaborative Mining with Geometric Data Perturbation," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 12, pp. 1764-1776, Dec. 2009, doi:10.1109/TPDS.2009.26
Usage of this product signifies your acceptance of the Terms of Use.