This Article 
 Bibliographic References 
 Add to: 
Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data
September 2006 (vol. 18 no. 9)
pp. 1253-1264
Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties often wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacy-preserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data need not be revealed. In this paper, we present privacy-preserving protocols for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn the Bayesian network on the combination of their databases without revealing anything else about their data to each other. We present an efficient and privacy-preserving protocol to construct a Bayesian network on the parties' joint data.

[1] D. Agrawal and C. Aggarwal, “On the Design and Quantification of Privacy Preserving Data Mining Algorithms,” Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, pp. 247-255, 2001.
[2] R. Agrawal, A. Evfimievski, and R. Srikant, “Information Sharing across Private Databases,” Proc. 2003 ACM SIGMOD Int'l Conf. Management of Data, pp. 86-97, 2003.
[3] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, pp. 439-450, May 2000.
[4] M. Atallah and W. Du, “Secure Multi-Party Computational Geometry,” Proc. Seventh Int'l Workshop Algorithms and Data Structures, pp. 165-179, 2001.
[5] R. Canetti, “Security and Composition of Multiparty Cryptographic Protocols,” J. Cryptology, vol. 13, no. 1, pp. 143-202, 2000.
[6] R. Canetti, Y. Ishai, R. Kumar, M. Reiter, R. Rubinfeld, and R.N. Wright, “Selective Private Function Evaluation with Applications to Private Statistics,” Proc. 20th Ann. ACM Symp. Principles of Distributed Computing, pp. 293-304, 2001.
[7] J. Canny, “Collaborative Filtering with Privacy,” Proc. 2002 IEEE Symp. Security and Privacy, pp. 45-57, 2002.
[8] R. Chen, K. Sivakumar, and H. Kargupta, “Learning Bayesian Network Structure from Distributed Data,” Proc. SIAM Int'l Data Mining Conf., pp. 284-288, 2003.
[9] R. Chen, K. Sivakumar, and H. Kargupta, “Collective Mining of Bayesian Networks from Distributed Heterogeneous Data,” Knowledge Information Syststems, vol. 6, no. 2, pp. 164-187, 2004.
[10] D.M. Chickering, “Learning Bayesian Networks is NP-Complete,” Learning from Data: Artificial Intelligence and Statistics V, pp. 121-130, 1996.
[11] G. Cooper and E. Herskovits, “A Bayesian Method for the Induction of Probabilistic Networks from Data,” Machine Learning, vol. 9, no. 4, pp. 309-347, 1992.
[12] J. Daemen and V. Rijmen, The Design of Rijndael: AES— The Advanced Encryption Standard. Springer-Verlag, 2002.
[13] V. Estivill-Castro and L. Brankovic, “Balancing Privacy against Precision in Mining for Logic Rules,” Proc. First Int'l Data Warehousing and Knowledge Discovery, pp. 389-398, 1999.
[14] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association Rules,“ Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 217-228, 2002.
[15] M. Freedman, K. Nissim, and B. Pinkas, “Efficient Private Matching and Set Intersection,” Advances in Cryptology— Proc. EUROCRYPT 2004, pp. 1-19, Springer-Verlag, 2004.
[16] B. Goethals, S. Laur, H. Lipmaa, and T. Mielikainen, “On Private Scalar Product Computation for Privacy-Preserving Data Mining,” Information Security and Cryptology— Proc. ICISC, vol. 3506, pp. 104-120, 2004.
[17] O. Goldreich, S. Micali, and A. Wigderson, “How to Play ANY Mental Game,” Proc. 19th Ann. ACM Conf. Theory of Computing, pp. 218-229, 1987.
[18] O. Goldreich, Foundations of Cryptography, Volume II: Basic Applications. Cambridge Univ. Press, 2004.
[19] The Health Insurance Portability and Accountability Act of 1996, /www.cms.hhs.govhipaa, 1996.
[20] G. Jagannathan, K. Pillaipakkamnatt, and R.N. Wright, “A New Privacy-Preserving Distributed $k{\hbox{-}}{\rm{Clustering}}$ Algorithm,” Proc. Sixth SIAM Int'l Conf. Data Mining, 2006.
[21] G. Jagannathan and R.N. Wright, “Privacy-Preserving Distributed $k{\hbox{-}}{\rm{Means}}$ Clustering over Arbitrarily Partitioned Data,” Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 593-599, 2005.
[22] E. Johnson and H. Kargupta, “Collective, Hierarchical Clustering from Distributed, Heterogeneous Data,” Lecture Notes in Computer Science, vol. 1759, pp. 221-244, 1999.
[23] M. Kantarcioglu and C. Clifton, “Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data,” Proc. ACM SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD '02), pp. 24-31, June 2002.
[24] M. Kantarcioglu and J. Vaidya, “Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data,” Proc. IEEE Workshop Privacy Preserving Data Mining, 2003.
[25] M. Kantarcioglu, J. Jin, and C. Clifton, “When Do Data Mining Results Violate Privacy?” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 599-604, 2004.
[26] O. Kardes, R.S. Ryger, R.N. Wright, and J. Feigenbaum, “Implementing Privacy-Preserving Bayesian-Net Discovery for Vertically Partitioned Data,” Proc. Int'l Conf. Data Mining Workshop Privacy and Security Aspects of Data Mining, 2005.
[27] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “On the Privacy Preserving Properties of Random Data Perturbation Techniques,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 99-106, 2003.
[28] H. Kargupta, B. Park, D. Hershberger, and E. Johnson, “Collective Data Mining: A New Perspective towards Distributed Data Mining,” Advances in Distributed and Parallel Knowledge Discovery, AAAI/MIT Press, 2000.
[29] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining,” J. Cryptology, vol. 15, no. 3, pp. 177-206, 2002.
[30] K. Liu, H. Kargupta, and J. Ryan, “Multiplicative Noise, Random Projection, and Privacy Preserving Data Mining from Distributed Multi-Party Data,” Technical Report TR-CS-03-24, Computer Science and Electrical Eng. Dept., Univ. of Maryland, Baltimore County, 2003.
[31] D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella, “Fairplay— A Secure Two-Party Computation System,” Proc. 13th Usenix Security Symp., pp. 287-302, 2004.
[32] D. Meng, K. Sivakumar, and H. Kargupta, “Privacy-Sensitive Bayesian Network Parameter Learning,” Proc. Fourth IEEE Int'l Conf. Data Mining, pp. 487-490, 2004.
[33] D.E. O'Leary, “Some Privacy Issues in Knowledge Discovery: The OECD Personal Privacy Guidelines,” IEEE Expert, vol. 10, no. 2, pp. 48-52, 1995.
[34] P. Paillier, “Public-Key Cryptosystems Based on Composite Degree Residue Classes,” Advances in Cryptography— Proc. EUROCRYPT '99, pp. 223-238, 1999.
[35] European Parliament, “Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the Protection of Individuals with Regard to the Processing of Personal Data and on the Free Movement of Such Data,” Official J. European Communities, p. 31 1995.
[36] European Parliament, “Directive 97/66/EC of the European Parliament and of the Council of 15 December 1997 Concerning the Processing of Personal Data and the Protection of Privacy in the Telecommunications Sector,” Official J. European Communities, pp. 1-8, 1998.
[37] S. Rizvi and J. Haritsa, “Maintaining Data Privacy in Association Rule Mining,” Proc. 28th Very Large Data Bases Conf., pp. 682-693, 2002.
[38] S. Stolfo, A. Prodromidis, S. Tselepis, W. Lee, D. Fan, and P. Chan, “JAM: Java Agents for Meta-Learning over Distributed Databases,” Knowledge Discovery and Data Mining, pp. 74-81, 1997.
[39] H. Subramaniam, R.N. Wright, and Z. Yang, “Experimental Analysis of Privacy-Preserving Statistics Computation,” Proc. Very Large Data Bases Worshop Secure Data Management, pp. 55-66, Aug. 2004.
[40] J. Vaidya and C. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 639-644, 2002.
[41] J. Vaidya and C. Clifton, “Privacy-Preserving k-Means Clustering over Vertically Partitioned Data,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 206-215, 2003.
[42] J. Vaidya and C. Clifton, “Privacy Preserving Naive Bayes Classifier on Vertically Partitioned Data,” Proc. 2004 SIAM Int'l Conf. Data Mining, 2004.
[43] R.N. Wright and Z. Yang, “Privacy-Preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 713-718, 2004.
[44] K. Yamanishi, “Distributed Cooperative Bayesian Learning Strategies,” Information and Computation, vol. 150, no. 1, pp. 22-56, 1999.
[45] Z. Yang and R.N. Wright, “Improved Privacy-Preserving Bayesian Network Parameter Learning on Vertically Partitioned Data,” Proc. Int'l Conf. Data Eng. Int'l Workshop Privacy Data Management, Apr. 2005.
[46] A. Yao, “How to Generate and Exchange Secrets,” Proc. 27th IEEE Symp. Foundations of Computer Science, pp. 162-167, 1986.

Index Terms:
Data privacy, Bayesian networks, privacy-preserving data mining.
Zhiqiang Yang, Rebecca N. Wright, "Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 9, pp. 1253-1264, Sept. 2006, doi:10.1109/TKDE.2006.147
Usage of this product signifies your acceptance of the Terms of Use.