This Article 
 Bibliographic References 
 Add to: 
Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining
January 2006 (vol. 18 no. 1)
pp. 92-106
This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matrix from distributed privacy sensitive data possibly owned by multiple parties. This class of problems is directly related to many other data-mining problems such as clustering, principal component analysis, and classification. This paper makes primary contributions on two different grounds. First, it explores Independent Component Analysis as a possible tool for breaching privacy in deterministic multiplicative perturbation-based models such as random orthogonal transformation and random rotation. Then, it proposes an approximate random projection-based technique to improve the level of privacy protection while still preserving certain statistical characteristics of the data. The paper presents extensive theoretical analysis and experimental results. Experiments demonstrate that the proposed technique is effective and can be successfully used for different types of privacy-preserving data mining applications.

[1] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[2] S. Chawla, C. Dwork, and F. McSherry, “Toward Privacy in Public Databases,” Proc. Second Theory of Cryptography Conf. (TCC'05), Feb. 2005.
[3] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “On the Privacy Preserving Properties of Random Data Perturbation Techniques,” Proc. IEEE Int'l Conf. Data Mining, Nov. 2003.
[4] W.B. Johnson and J. Lindenstrauss, “Extensions of Lipshitz Mapping into Hilbert Space,” Contemporary Math., vol. 26, pp. 189-206, 1984.
[5] C.K. Liew, U.J. Choi, and C.J. Liew, “A Data Distortion by Probability Distribution,” ACM Trans. Database Systems (TODS), vol. 10, no. 3, pp. 395-411, 1985.
[6] E. Lefons, A. Silvestri, and F. Tangorra, “An Analytic Approach to Statistical Databases,” Proc. Ninth Int'l Conf. Very Large Data Bases, pp. 260-274, Nov. 1983.
[7] N.R. Adam and J.C. Worthmann, “Security-Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys (CSUR), vol. 21, no. 4, pp. 515-556, 1989.
[8] R. Agrawal and R. Srikant, “Privacy Preserving Data Mining,” Proc. ACM SIGMOD Conf. Management of Data, pp. 439-450, May 2000.
[9] J.J. Kim and W.E. Winkler, “Multiplicative Noise for Masking Continuous Data,” Technical Report Statistics #2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C., Apr. 2003.
[10] S. Warner, “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias,” J. Am. Statistical Assoc., vol. 60, pp. 63-69, 1965.
[11] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association Rules,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD'02), July 2002.
[12] A. Evfimevski, J. Gehrke, and R. Srikant, “Limiting Privacy Breaches in Privacy Preserving Data Mining,” Proc. ACM SIGMOD/PODS Conf., June 2003.
[13] S. Agrawal and J.R. Haritsa, “A Framework for High-Accuracy Privacy-Preserving Mining,” Proc. 21st Int'l Conf. Data Eng. (ICDE'05), pp. 193-204, Apr. 2005.
[14] T. Dalenius and S.P. Reiss, “Data-Swapping: A Technique for Disclosure Control,” J. Statistical Planning and Inference, vol. 6, pp. 73-85, 1982.
[15] S.E. Fienberg and J. McIntyre, “Data Swapping: Variations on a Theme by Dalenius and Reiss,” technical report, Nat'l Inst. of Statistical Sciences, Research Triangle Park, NC, 2003.
[16] A.C. Yao, “How to Generate and Exchange Secrets,” Proc. 27th IEEE Symp. Foundations of Computer Science, pp. 162-167, 1986.
[17] B. Pinkas, “Cryptographic Techniques for Privacy Preserving Data Mining,” SIGKDD Explorations, vol. 4, no. 2, pp. 12-19, 2002.
[18] O. Goldreich, The Foundations of Cryptography, vol. 2, chapter 7. Cambridge Univ. Press, 2004.
[19] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, “Tools for Privacy Preserving Distributed Data Mining,” ACM SIGKDD Explorations, vol. 4, no. 2, 2003.
[20] V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-Art in Privacy Preserving Data Mining,” ACM SIGMOD Record, vol. 3, no. 1, pp. 50-57, Mar. 2004.
[21] B.-H. Park and H. Kargupta, “Distributed Data Mining,” The Handbook of Data Mining, ser. Human Factors and Ergonomics, pp. 341-358, N. Ye, ed., Lawrence Erlbaum Associates, Inc., 2003.
[22] K. Liu, H. Kargupta, J. Ryan, and K. Bhaduri, “Distributed Data Mining Bibliography,”, 2004.
[23] S. Merugu and J. Ghosh, “Privacy-Preserving Distributed Clustering Using Generative Models,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM'03), Nov. 2003.
[24] D. Meng, K. Sivakumar, and H. Kargupta, “Privacy Sensitive Bayesian Network Parameter Learning,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM'04), Nov. 2004.
[25] M.J. Atallah, E. Bertino, A.K. Elmagarmid, M. Ibrahim, and V.S. Verykios, “Disclosure Limitation of Sensitive Rules,” Proc. IEEE Knowledge and Data Eng. Workshop, pp. 45-52, 1999.
[26] V.S. Verykios, A.K. Elmagarmid, B. Elisa, Y. Saygin, and D. Elena, “Association Rule Hiding,” IEEE Trans. Knowledge and Data Eng., 2003.
[27] Y. Saygin, V.S. Verykios, and C. Clifton, “Using Unknowns to Prevent Discovery of Association Rules,” SIGMOD Record, vol. 30, no. 4, pp. 45-54, Dec. 2001.
[28] E.W. Weisstein et al., “Orthogonal Transformation,” MathWorld-A Wolfram Web Resource, 2004.
[29] S.R.M. Oliveira and O.R. Zaïane, “Privacy Preserving Clustering by Data Transformation,” Proc. 18th Brazilian Symp. Databases, pp. 304-318, Oct. 2003.
[30] P. Common, “Independent Component Analysis: A New Concept?” IEEE Trans. Signal Processing, vol. 36, pp. 287-314, 1994.
[31] A. Hyvärinen and E. Oja, “Independent Component Analysis: Algorithms and Applications,” Neural Networks, vol. 13, no. 4, pp. 411-430, June 2000.
[32] X.-R. Cao and R.-W. Liu, “A General Approach to Blind Source Separation,” IEEE Trans. Signal Processing, vol. 44, pp. 562-571, 1996.
[33] M.L. Eaton and M.D. Perlman, “The Non-Singularity of Generalized Sample Covariance Matrices,” The Annals of Statistics, vol. 1, no. 4, pp. 710-717, 1973.
[34] A.K. Gupta and D.K. Nagar, Matrix Variate Distributions, H. Brezis, R.G. Douglas, and A. Jeffrey, eds. Chapan & Hall/CRC, 1999.
[35] W. Hardle and L. Simar, Applied Multivariate Statistical Analysis. chapter 2.1, pp. 57-63, Springer, 2003.
[36] J. Eriksson and V. Koivunen, “Identifiability and Separability of Linear ICA Models Revisited,” Proc. Fourth Int'l Symp. Independent Component Analysis and Blind Signal Separation (ICA2003), Apr. 2003.
[37] M.S. Lewicki and T.J. Sejnowski, “Learning Overcomplete Representations,” Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
[38] R. Hecht-Nielsen, “Context Vectors: General Purpose Approximate Meaning Representations Self-Organized from Raw Data,” Computational Intelligence: Imitating Life, pp. 43-56, 1994.
[39] R.I. Arriaga and S. Vempala, “An Algorithmic Theory of Learning: Robust Concepts and Random Projection,” Proc. 40th Ann. Symp. Foundations of Computer Science, pp. 616-623, Oct. 1999.
[40] S. Kaski, “Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering,” Proc. Int'l Joint Conf. Neural Networks (IJCNN'98), vol. 1, pp. 413-418, 1998.
[41] C. Giannella, K. Liu, T. Olsen, and H. Kargupta, “Communication Efficient Construction of Decision Trees over Heterogeneously Distributed Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM'04), Nov. 2004.
[42] J.W. Demmel and N.J. Higham, “Improved Error Bounds for Underdetermined System Solvers,” Technical Report CS-90-113, Computer Science Dept., Univ. of Tennessee, Knoxville, TN, Aug. 1990.
[43] W. Du, Y.S. Han, and S. Chen, “Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification,” Proc. 2004 SIAM Int'l Conf. Data Mining (SDM04), Apr. 2004.
[44] M.J. Atallah and W. Du, “Secure Multi-Party Computational Geometry,” Proc. WADS2001: Seventh Int'l Workshop on Algorithms and Data Structures, pp. 165-179, Aug. 2001.
[45] W. Du and Z. Zhan, “Building Decision Tree Classifier on Private Data,” Proc. IEEE Int'l Conf. Privacy, Security, and Data Mining, pp. 1-8, Dec. 2002.
[46] J.S. Vaidya and C. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, July 2002.

Index Terms:
Index Terms- Random projection, multiplicative data perturbation, privacy preserving data mining.
Kun Liu, Hillol Kargupta, Jessica Ryan, "Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 92-106, Jan. 2006, doi:10.1109/TKDE.2006.14
Usage of this product signifies your acceptance of the Terms of Use.