Subscribe

Issue No.01 - Jan.-Feb. (2014 vol.11)

pp: 59-71

Noman Mohammed , McGill University, Montreal

Dima Alhadidi , Concordia University, Montreal

Benjamin C.M. Fung , Concordia University, Montreal

Mourad Debbabi , Concordia University, Montreal

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TDSC.2013.22

ABSTRACT

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, $(\epsilon)$-differential privacy provides one of the strongest privacy guarantees. In this paper, we address the problem of private data publishing, where different attributes for the same set of individuals are held by two parties. In particular, we present an algorithm for differentially private data release for vertically partitioned data between two parties in the semihonest adversary model. To achieve this, we first present a two-party protocol for the exponential mechanism. This protocol can be used as a subprotocol by any other algorithm that requires the exponential mechanism in a distributed setting. Furthermore, we propose a two-party algorithm that releases differentially private data in a secure way according to the definition of secure multiparty computation. Experimental results on real-life data suggest that the proposed algorithm can effectively preserve information for a data mining task.

INDEX TERMS

classification analysis, Differential privacy, secure data integration,

CITATION

Noman Mohammed, Dima Alhadidi, Benjamin C.M. Fung, Mourad Debbabi, "Secure Two-Party Differentially Private Data Release for Vertically Partitioned Data",

*IEEE Transactions on Dependable and Secure Computing*, vol.11, no. 1, pp. 59-71, Jan.-Feb. 2014, doi:10.1109/TDSC.2013.22REFERENCES

- [1] R. Agrawal, A. Evfimievski, and R. Srikant, "Information Sharing Across Private Databases,"
Proc. ACM Int'l Conf. Management of Data, 2003.- [2] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, "Privacy Accuracy, and Consistency Too: A Holistic Solution to Contingency Table Release,"
Proc. ACM Symp. Principles of Database Systems (PODS '07), 2007.- [3] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal $k$ -Anonymization,"
Proc. IEEE Int'l Conf. Data Eng. (ICDE '05), 2005.- [4] R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta, "Discovering Frequent Patterns in Sensitive Data,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '10), 2010.- [5] A. Blum, K. Ligett, and A. Roth, "A Learning Theory Approach to Non-Interactive Database Privacy,"
Proc. ACM Symp. Theory of Computing (STOC '08), 2008.- [6] J. Brickell and V. Shmatikov, "Privacy-Preserving Classifier Learning,"
Proc. Int'l Conf. Financial Cryptography and Data Security, 2009.- [7] P. Bunn and R. Ostrovsky, "Secure Two-Party K-Means Clustering,"
Proc. ACM Conf. Computer and Comm. Security (CCS '07), 2007.- [8] K. Chaudhuri, C. Monteleoni, and A. Sarwate, "Differentially Private Empirical Risk Minimization,"
J. Machine Learning Research, vol. 12, pp. 1069-1109, July 2011.- [9] K. Chaudhuri, A.D. Sarwate, and K. Sinha, "Near-Optimal Differentially Private Principal Components,"
Proc. Conf. Neural Information Processing Systems, 2012.- [10] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M.Y. Zhu, "Tools for Privacy Preserving Distributed Data Mining,"
ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 28-34, Dec. 2002.- [11] I. Dinur and K. Nissim, "Revealing Information while Preserving Privacy,"
Proc. ACM Symp. Principles of Database Systems (PODS '03), 2003.- [12] C. Dwork, "A Firm Foundation for Private Data Analysis,"
Comm. ACM, vol. 54, no. 1, pp. 86-95, 2011.- [13] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, "Our Data Ourselves: Privacy via Distributed Noise Generation,"
Proc. 25th Ann. Int'l Conf. Theory and Applications of Cryptographic Techniques (EUROCRYPT '06), 2006.- [14] C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating Noise to Sensitivity in Private Data Analysis,"
Proc. Theory of Cryptography Conf. (TCC '06), 2006.- [15] A. Frank and A. Asuncion, UCI Machine Learning Repository, http://mlearn.ics.uci.eduMLRepository.html, 2010.
- [16] A. Friedman, and A. Schuster, "Data Mining with Differential Privacy,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '10), 2010.- [17] B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, "Privacy-Preserving Data Publishing: A Survey of Recent Developments,"
ACM Computing Surveys, vol. 42, no. 4, pp. 1-53, June 2010.- [18] B.C.M. Fung, K. Wang, and P.S. Yu, "Anonymizing Classification Data for Privacy Preservation,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711-725, May 2007.- [19] S.R. Ganta, S. Kasiviswanathan, and A. Smith, "Composition Attacks and Auxiliary Information in Data Privacy,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '08), 2008.- [20] O. Goldreich, "A Note on Computational Indistinguishability,"
Information Processing Letter, vol. 34, no. 6, pp. 277-281, 1990.- [21] O. Goldreich,
Foundations of Cryptography, vol. 2, Cambridge Univ. Press, 2001.- [22] O. Goldreich, S. Micali, and A. Wigderson, "How to Play Any Mental Game—A Completeness Theorem for Protocols with Honest Majority,"
Proc. ACM Symp. Theory of Computing (STOC '87), 1987.- [23] M. Hay, V. Rastogi, G. Miklau, and D. Suciu, "Boosting the Accuracy of Differentially Private Histograms through Consistency,"
Proc. Int'l Conf. Very Large Data Bases (VLDB '10), 2010.- [24] A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino, "Private Record Matching Using Differential Privacy,"
Proc. Int'l Conf. Extending Database Technology (EDBT '10), 2010.- [25] V.S. Iyengar, "Transforming Data to Satisfy Privacy Constraints,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '02), 2002.- [26] W. Jiang and C. Clifton, "A Secure Distributed Framework for Achieving $k$ -Anonymity,"
Very Large Data Bases J., vol. 15, no. 4, pp. 316-333, Nov. 2006.- [27] P. Jurczyk and L. Xiong, "Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers,"
Proc. Ann. IFIP WG 11.3 Working Conf. Data and Applications Security (DBSec '09), 2009.- [28] D. Kifer, "Attacks on Privacy and De Finetti's Theorem,"
Proc. ACM Conf. Management of Data (SIGMOD '09), 2009.- [29] D. Kifer and A. Machanavajjhala, "No Free Lunch in Data Privacy,"
Proc. ACM Conf. Management of Data (SIGMOD '11), 2011.- [30] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-Anonymity,"
Proc. IEEE Int'l Conf. Data Eng. (ICDE '06), 2006.- [31] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Workload-Aware Anonymization Techniques for Large-Scale Data Sets,"
ACM Trans. Database Systems, vol. 33, article 17, 2008.- [32] N. Li, T. Li, and S. Venkatasubramanian, "$t$ -Closeness: Privacy Beyond $k$ -Anonymity and $\ell$ -Diversity,"
Proc. IEEE Int'l Conf. Data Eng. (ICDE '07), 2007.- [33] Y. Lindell and B. Pinkas, "Privacy Preserving Data Mining,"
J. Cryptology, vol. 15, no. 3, pp. 177-206, 2002.- [34] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy Beyond $k$ -Anonymity,"
ACM Trans. Knowledge Discovery from Data, vol. 1, article 3, 2007.- [35] D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern, "Worst-Case Background Knowledge in Privacy-Preserving Data Publishing,"
Proc. IEEE Int'l Conf. Data Eng. (ICDE '07), 2007.- [36] A. McGregor, I. Mironov, T. Pitassi, O. Reingold, K. Talwar, and S. Vadhan, "The Limits of Two-Party Differential Privacy,"
Proc. IEEE Symp. Foundations of Computer Science (FOCS '10), 2010.- [37] F. McSherry and K. Talwar, "Mechanism Design via Differential Privacy,"
Proc. IEEE Symp. Foundations of Computer Science (FOCS '07), 2007.- [38] N. Mohammed, R. Chen, B.C.M. Fung, and P.S. Yu, "Differentially Private Data Release for Data Mining,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '11), 2011.- [39] N. Mohammed, B.C.M. Fung, and M. Debbabi, "Anonymity Meets Game Theory: Secure Data Integration with Malicious Participants,"
Very Large Data Bases J., vol. 20, no. 4, pp. 567-588, Aug. 2011.- [40] N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C. Lee, "Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '09), 2009.- [41] N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C. Lee, "Centralized and Distributed Anonymization for High-Dimensional Healthcare Data,"
ACM Trans. Knowledge Discovery from Data, vol. 4, no. 4, pp. 18:1-18:33, Oct. 2010.- [42] M. Naor and B. Pinkas, "Efficient Oblivious Transfer Protocol,"
Proc. 12th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '01), 2001.- [43] A. Narayan and A. Haeberlen, "DJoin: Differentially Private Join Queries over Distributed Databases,"
Proc. 10th USENIX Conf. Operating Systems Design and Implementation (OSDI '12), 2012.- [44] P. Paillier, "Public-Key Cryptosystems Based on Composite Degree Residuosity Classes,"
Proc. 17th Int'l Conf. Theory and Application Cryptographic Techniques, pp. 223-238, 1999.- [45] J.R. Quinlan,
C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.- [46] V. Rastogi and S. Nath, "Differentially Private Aggregation of Distributed Time-Series with Transformation and Encryption,"
Proc. ACM Int'l Conf. Management of Data (Sigmod '10), 2010.- [47] A. Roth and T. Roughgarden, "Interactive Privacy via the Median Mechanism,"
Proc. ACM Symp. Theory of Computing (STOC '10), 2010.- [48] P. Samarati, "Protecting Respondents' Identities in Microdata Release,"
IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov. 2001.- [49] L. Sweeney, "$k$ -Anonymity: A Model for Protecting Privacy,"
Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, pp. 557-570, 2002.- [50] J. Vaidya and C. Clifton, "Privacy Preserving Association Rule Mining in Vertically Partitioned Data,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '02), 2002.- [51] J. Vaidya and C. Clifton, "Privacy-Preserving $k$ -Means Clustering over Vertically Partitioned Data,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '03), 2003.- [52] K. Wang, B.C.M. Fung, and P.S. Yu, "Handicapping Attacker's Confidence: An Alternative to $k$ -Anonymization,"
Knowledge and Information Systems, vol. 11, no. 3, pp. 345-368, Apr. 2007.- [53] O. Williams and F. McSherry, "Probabilistic Inference and Differential Privacy,"
Proc. Conf. Neural Information Processing Systems (NIPS '10), 2010.- [54] R.C.W. Wong, A.W.C. Fu, K. Wang, and J. Pei, "Minimality Attack in Privacy Preserving Data Publishing,"
Proc. Int'l Conf. Very Large Data Bases, 2007.- [55] R.C.W. Wong, A.W.C. Fu, K. Wang, Y. Xu, and P.S. Yu, "Can the Utility of Anonymized Data be Used for Privacy Breaches?"
ACM Trans. Knowledge Discovery from Data, vol. 5, no. 3,article 16, Aug. 2011.- [56] R.C.W. Wong, J. Li, A.W.C. Fu, and K. Wang, "($\alpha$ , $k$ )-Anonymity: An Enhanced $k$ -Anonymity Model for Privacy Preserving Data Publishing,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '06), 2006.- [57] R. Wright and Z. Yang, "Privacy-Preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '04), 2004.- [58] X. Xiao, G. Wang, and J. Gehrke, "Differential Privacy via Wavelet Transforms,"
Proc. IEEE Int'l Conf. Data Eng., 2010.- [59] A.C. Yao, "Protocols for Secure Computations,"
Proc. IEEE Symp. Foundations of Computer Science (FOCS '82), 1982.- [60] L. Zhang, S. Jajodia, and A. Brodsky, "Information Disclosure under Realistic Assumptions: Privacy versus Optimality,"
Proc. ACM Conf. Computer and Comm. Security (CCS '07), 2007. |