The Community for Technology Leaders
RSS Icon
Issue No.05 - Sept.-Oct. (2012 vol.9)
pp: 684-698
Ali Inan , Isik University, Istanbul
Murat Kantarcioglu , University of Texas at Dallas, Richardson
Gabriel Ghinita , Purdue University, West Lafayette
Elisa Bertino , Purdue University, West Lafayette
Real-world entities are not always represented by the same set of features in different data sets. Therefore, matching records of the same real-world entity distributed across these data sets is a challenging task. If the data sets contain private information, the problem becomes even more difficult. Existing solutions to this problem generally follow two approaches: sanitization techniques and cryptographic techniques. We propose a hybrid technique that combines these two approaches and enables users to trade off between privacy, accuracy, and cost. Our main contribution is the use of a blocking phase that operates over sanitized data to filter out in a privacy-preserving manner pairs of records that do not satisfy the matching condition. We also provide a formal definition of privacy and prove that the participants of our protocols learn nothing other than their share of the result and what can be inferred from their share of the result, their input and sanitized views of the input data sets (which are considered public information). Our method incurs considerably lower costs than cryptographic techniques and yields significantly more accurate matching results compared to sanitization techniques, even when privacy requirements are high.
Protocols, Privacy, Data privacy, Cryptography, Accuracy, Databases, differential privacy., Privacy, security, record matching, anonymization
Ali Inan, Murat Kantarcioglu, Gabriel Ghinita, Elisa Bertino, "A Hybrid Approach to Private Record Matching", IEEE Transactions on Dependable and Secure Computing, vol.9, no. 5, pp. 684-698, Sept.-Oct. 2012, doi:10.1109/TDSC.2012.46
[1] A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge and Database Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
[2] C. Clifton, M. Kantarciolu, A. Doan, G. Schadow, J. Vaidya, A. Elmagarmid, and D. Suciu, "Privacy-Preserving Data Integration and Sharing," Proc. Ninth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD '04), pp. 19-26, 2004.
[3] C. Quantin, H. Bouzelat, F. Allaert, A. Benhamiche, J. Faivre, and L. Dusserre, "How to Ensure Data Security of an Epidemiological Follow-Up: Quality Assessment of an Anonymous Record Linkage Procedure," Int'l J. Medical Informatics, vol. 49, no. 1, pp. 117-122, 1998.
[4] T. Churces and P. Christen, "Some Methods for Blindfolded Record Linkage," Medical Informatics and Decision Making, vol. 4, no. 9, 2004.
[5] M. Scannapieco, I. Figotin, E. Bertino, and A.K. Elmagarmid, "Privacy Preserving Schema and Data Matching," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 653-664, 2007.
[6] A. Al-Lawati, D. Lee, and P. McDaniel, "Blocking-Aware Private Record Linkage," Proc. Second Int'l Workshop Information Quality in Information Systems (IQIS '05), pp. 59-68, 2005.
[7] R. Agrawal, A. Evfimievski, and R. Srikant, "Information Sharing across Private Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 86-97, 2003.
[8] B.C.M. Fung, K. Wang, and P.S. Yu, "Top-Down Specialization for Information and Privacy Preservation," Proc. 21st Int'l Conf. Data Eng. (ICDE '05), pp. 205-216, 2005.
[9] L. Sweeney, "k-Anonymity: A Model for Protecting Privacy," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[10] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "L-Diversity: Privacy Beyond K-Anonymity," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), p. 24, 2006.
[11] R. Agrawal and R. Srikant, "Privacy-Preserving Data Mining," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 439-450, 2000.
[12] C. Dwork, "Differential Privacy," Proc. Int'l Colloquium Automata, Languages and Programming (ICALP '02), pp. 1-12, 2006.
[13] O. Goldreich, "General Cryptographic Protocols," The Foundations of Cryptography, vol. 2, Cambridge Univ. Press, 2004.
[14] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, "On the Privacy Preserving Properties of Random Data Perturbation Techniques," Proc. IEEE Third Int'l Conf. Data Mining (ICDM '03), pp. 99-106, 2003.
[15] A.C. Yao, "Protocols for Secure Computation," Proc. IEEE Symp. Foundations of Computer Science (CS), pp. 160-164, 1982.
[16] J. Vaidya and C. Clifton, "Secure Set Intersection Cardinality with Application to Association Rule Mining," J. Computer Security, vol. 13, no. 4, pp. 593-622, 2005.
[17] M.J. Freedman, K. Nissim, and B. Pinkas, "Efficient Private Matching and Set Intersection," Proc. Eurocrypt, 2004.
[18] L. Kissner and D. Song, "Privacy-Preserving Set Operations," Proc. CRYPTO, pp. 241-257, 2005.
[19] R. Agrawal, D. Asonov, M. Kantarcioglu, and Y. Li, "Sovereign Joins," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), pp. 26-37, 2006.
[20] M.G. Elfeky, A.K. Elmagarmid, and V.S. Verykios, "TAILOR: A Record Linkage Tool Box," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 17-28, 2002.
[21] R. Schnell, T. Bachteler, and J. Reiher, "Privacy-Preserving Record Linkage Using Bloom Filters," BMC Medical Informatics and Decision Making, vol. 9, no. 1, p. 41, Aug. 2009.
[22] C.M. O'Keefe, M. Yung, L. Gu, and R. Baxter, "Privacy-Preserving Data Linkage Protocols," Proc. ACM Workshop Privacy in the Electronic Soc. (WPES '04), pp. 94-102, 2004.
[23] V.S. Verykios, A. Karakasidis, and V.K. Mitrogiannis, "Privacy Preserving Record Linkage Approaches," Int'l J. Data Mining, Modelling and Management, vol. 1, pp. 206-221, 2009.
[24] M.J. Atallah, F. Kerschbaum, and W. Du, "Secure and Private Sequence Comparisons," Proc. ACM Workshop Privacy in the Electronic Soc. (WPES '03), pp. 39-44, 2003.
[25] F. Emekci, D. Agrawal, A.E. Abbadi, and A. Gulbeden, "Privacy Preserving Query Processing Using Third Parties," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), 2006.
[26] M. Kantarcioglu, A. Inan, W. Jiang, and B. Malin, "Formal Anonymity Models for Efficient Privacy-Preserving Joins," Data and Knowledge Eng., vol. 68, no. 11, pp. 1206-1223, 2009.
[27] A. Inan, M. Kantarcioglu, E. Bertino, and M. Scannapieco, "A Hybrid Approach to Private Record Linkage," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), pp. 496-505, 2008.
[28] A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino, "Private Record Matching Using Differential Privacy," Proc. 13th Int'l Conf. Extending Database Technology (EDBT '10), pp. 123-134, 2010.
[29] N. Li, T. Li, and S. Venkatasubramanian, "T-Closeness: Privacy Beyond K-Anonymity and L-Diversity," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 106-115, 2007.
[30] C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating Noise to Sensitivity in Private Data Analysis," Proc. Third Theory Computing Conf. (TCC), pp. 265-284, 2006.
[31] X. Xiao and Y. Tao, "Output Perturbation with Query Relaxation," Proc. VLDB Endowment, vol. 1, no. 1, pp. 857-869, 2008.
[32] C. Clifton, M. Kantarcoğlu, X. Lin, J. Vaidya, and M. Zhu, "Tools for Privacy Preserving Distributed Data Mining," SIGKDD Explorations, vol. 4, no. 2, pp. 28-34, Jan. 2003.
[33] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," Proc. 32nd Int'l Conf. Very Large Databases, pp. 139-150, 2006.
[34] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain k-Anonymity," Proc. SIGMOD Int'l Conf. Management of Data, pp. 49-60, 2005.
[35] K. Le Fevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional k-Anonymity," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), pp. 25-35, 2006.
46 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool