The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2013 vol.25)
pp: 131-144
A. U. Asuncion , Dept. of Comput. Sci., Univ. of California, Irvine, Irvine, CA, USA
M. T. Goodrich , Dept. of Comput. Sci., Univ. of California, Irvine, Irvine, CA, USA
ABSTRACT
In this paper, we study sparsity-exploiting Mastermind algorithms for attacking the privacy of an entire database of character strings or vectors, such as DNA strings, movie ratings, or social network friendship data. Based on reductions to nonadaptive group testing, our methods are able to take advantage of minimal amounts of privacy leakage, such as contained in a single bit that indicates if two people in a medical database have any common genetic mutations, or if two people have any common friends in an online social network. We analyze our Mastermind attack algorithms using theoretical characterizations that provide sublinear bounds on the number of queries needed to clone the database, as well as experimental tests on genomic information, collaborative filtering data, and online social networks. By taking advantage of the generally sparse nature of these real-world databases and modulating a parameter that controls query sparsity, we demonstrate that relatively few nonadaptive queries are needed to recover a large majority of each database.
INDEX TERMS
string matching, collaborative filtering, data privacy, medical administrative data processing, query processing, social networking (online), nonadaptive queries, nonadaptive Mastermind algorithms, vector databases, sparsity-exploiting Mastermind algorithms, database privacy attack, character string database, nonadaptive group testing, privacy leakage, medical database, genetic mutations, online social network, Mastermind attack algorithms, sublinear bounds, database querying, genomic information, collaborative filtering data, query sparsity control, Databases, Data privacy, Social network services, Testing, Protocols, Cloning, nonadaptive attacks, Mastermind algorithms, privacy leaks, data cloning, combinatorial group testing
CITATION
A. U. Asuncion, M. T. Goodrich, "Nonadaptive Mastermind Algorithms for String and Vector Databases, with Case Studies", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 1, pp. 131-144, Jan. 2013, doi:10.1109/TKDE.2011.147
REFERENCES
[1] P. Baldi, R.W. Benz, D. Hirschberg, and S. Swamidass, "Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval," J. Chemical Information and Modeling, vol. 47, no. 6, pp. 2098-2109, 2007.
[2] S. Swamidass and P. Baldi, "Bounds and Algorithms for Exact Searches of Chemical Fingerprints in Linear and Sub-Linear Time," J. Chemical Information and Modeling, vol. 47, no. 2, pp. 302-317, 2007.
[3] A. Narayanan and V. Shmatikov, "De-Anonymizing Social Networks," Proc. IEEE 30th Symp. Security and Privacy (SP '09), pp. 173-187, 2009.
[4] M.J. Atallah, F. Kerschbaum, and W. Du, "Secure and Private Sequence Comparisons," Proc. ACM Workshop Privacy in the Electronic Soc. (WPES '03), pp. 39-44, 2003.
[5] A. Ben-David, N. Nisan, and B. Pinkas, "FairplayMP - A System for Secure Multi-Party Computation," Proc. ACM Symp. Computer and Comm. Security (CCS), pp. 257-266, 2008.
[6] I. Damgård, M. Fitzi, E. Kiltz, J.B. Nielsen, and T. Toft, "Unconditionally Secure Constant-Rounds Multi-Party Computation for Equality, Comparison, Bits and Exponentiation," Proc. Third Theory of Cryptography Conf., S. Halevi and T. Rabin, eds., pp. 285-304, 2006.
[7] S. Jha, L. Kruger, and V. Shmatikov, "Towards Practical Privacy for Genomic Computation," Proc. IEEE Symp. Security and Privacy, pp. 216-230, 2008.
[8] W. Jiang, M. Murugesan, C. Clifton, and L. Si, "Similar Document Detection with Limited Information Disclosure," Proc. IEEE 24th Int'l Conf. Data Eng., pp. 735-743, 2008.
[9] D. Szajda, M. Pohl, J. Owen, and B.G. Lawson, "Toward a Practical Data Privacy Scheme for a Distributed Implementation of the Smith-Waterman Genome Sequence Comparison Algorithm," Proc. Network and Distributed System Security (NDSS) Symp., 2006.
[10] Y. Sang and H. Shen, "Privacy Preserving Set Intersection Protocol Secure against Malicious Behaviors," Proc. Eighth Int'l Conf. Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 461-468, 2007.
[11] A.C. Yao, "Protocols for Secure Computations," Proc. 23rd Symp. Foundations of Computer Science (FOCS), pp. 160-164, 1982.
[12] D.S. Hirschberg and P. Baldi, "Effective Compression of Monotone and Quasi-Monotone Sequences of Integers," Proc. Data Compression Conf. (DCC '08), 2008.
[13] B. Pakendorf and M. Stoneking, "Mitochondrial DNA and Human Evolution," Ann. Rev. Genomics Human Genetics, vol. 6, pp. 165-183, 2005.
[14] D.M. Behar1, S. Rosset, J. Blue-Smith, O. Balanovsky, S. Tzur1, D. Comas, R.J. Mitchell, L. Quintana-Murci, C. Tyler-Smith, and R.S. Wells, "The Genographic Project Public Participation Mitochondrial DNA Database," PLoS Genetics, vol. 3, no. 6, 2005.
[15] M. Brandon, M. Lott, K. Nguyen, S. Spolim, S. Navathe, P. Baldi, and D. Wallace, "MITOMAP: A Human Mitochondrial Genome Database - 2004 Update," Nucleic Acids Research, vol. 33, pp. 611-613, 2005.
[16] E. Ruiz-Pesini, M.T. Lott, V. Procaccio, J. Poole, M.C. Brandon, D. Mishmar, C. Yi, J. Kreuziger, P. Baldi, and D.C. Wallace, "An Enhanced MITOMAP with a Global mtDNA Mutational Philogeny," Nucleic Acids Research, vol. 35, pp. D823-D828, 2007.
[17] S. Harihara, M. Hirai, Y. Suutou, K. Shimizu, and K. Omoto, "Frequency of a 9-bp Deletion in the Mitochondrial DNA among Asian Populations," Human Biology, vol. 64, no. 2, pp. 161-166, 1992.
[18] K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis, "Tastes, Ties, and Time: A New Social Network Dataset Using Facebook.com," Social Networks, vol. 30, no. 4, pp. 330-342, 2008.
[19] R. Gross, A. Acquisti, and H.J. HeinzIII, "Information Revelation and Privacy in Online Social Networks," Proc. ACM Workshop Privacy in the Electronic Soc. (WPES '05), pp. 71-80, 2005.
[20] L.A. Stern and K. Taylor, "Social Networking on Facebook," J. Comm., Speech & Theatre Assoc. of North Dakota, vol. 20, pp. 9-20, 2007.
[21] M.T. Goodrich, "The Mastermind Attack on Genomic Data," Proc. IEEE Symp. Security and Privacy, pp. 204-218, 2009.
[22] V. Chvátal, "Mastermind," Combinatorica, vol. 3, nos. 3/4, pp. 325-329, 1983.
[23] D. Knuth, "The Computer as a Master Mind," J. Recreational Math., vol. 9, pp. 1-5, 1977.
[24] Z. Chen, C. Cunha, and S. Homer, "Finding A Hidden Code by Asking Questions," Proc. Second Int'l Conf. Computing and Combinatorics (COCOON), pp. 50-55, 1996.
[25] M.T. Goodrich, "On the Algorithmic Complexity of the Mastermind Game with Black-Peg Results," Information Processing Letters, vol. 109, no. 13, pp. 675-678, 2009.
[26] F. Bancilhon and N. Spyratos, "Protection of Information in Relational Data Bases," Proc. Third Int'l Conf. Very Large Data Bases (VLDB '77), pp. 494-500, 1977.
[27] A. Deutsch and Y. Papakonstantinou, "Privacy in Database Publishing," Proc. 10th Int'l Conf. Database Theory (ICDT), T. Eiter and L. Libkin, eds., pp. 230-245, 2005.
[28] G. Miklau and D. Suciu, "A Formal Analysis of Information Disclosure in Data Exchange," J. Computer and System Sciences, vol. 73, no. 3, pp. 507-534, 2007.
[29] M. Kantarcioğlu, J. Jin, and C. Clifton, "When Do Data Mining Results Violate Privacy?," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 599-604, 2004.
[30] R. Agrawal and J. Kiernan, "Watermarking Relational Databases," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 155-166, 2002.
[31] R. Agrawal, P.J. Haas, and J. Kiernan, "A System for Watermarking Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, p. 674, 2003.
[32] D. Gross-Amblard, "Query-Preserving Watermarking of Relational Databases and XML Documents," Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '03), pp. 191-201, 2003.
[33] G. Schulz and M. Voigt, "A High Capacity Watermarking System for Digital Maps," Proc. Workshop Multimedia and Security (MM&Sec '04), pp. 180-186, 2004.
[34] R. Sion, M. Atallah, and S. Prabhakar, "Rights Protection for Relational Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 98-109, 2003.
[35] R. Sion, "Rights Assessment for Relational Data," Proc. Secure Data Management in Decentralized Systems, T. Yu and S. Jajodia, eds., pp. 427-457, Springer, 2007.
[36] K. LeFevre, D.J. Dewitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain K-Anonymity," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 49-60, 2005.
[37] P. Samarati, "Protecting Respondents' Identities in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
[38] P. Samarati and L. Sweeney, "Protecting Privacy when Disclosing Information: K-Anonymity and Its Enforcement through Generalization and Suppression," SRI, technical report, 1998.
[39] A. Meyerson and R. Williams, "On the Complexity of Optimal K-Anonymity," Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '04), pp. 223-228, 2004.
[40] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, "Anonymizing Tables," Proc. 10th Int'l Conf. Database Theory (ICDT), pp. 246-258, 2005.
[41] J.-W. Byun, A. Kamra, E. Bertino, and N. Li, "Efficient K-Anonymization Using Clustering Techniques," Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA), pp. 188-200, 2007.
[42] S. Zhong, Z. Yang, and R.N. Wright, "Privacy-enhancing K-Anonymization of Customer Data," Proc. 24th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '05), pp. 139-147, 2005.
[43] R.J. Bayardo and R. Agrawal, "Data Privacy through Optimal K-Anonymization," Proc. 21st Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.
[44] W. Du and M.J. Atallah, "Secure Multi-Party Computation Problems and Their Applications: A Review and Open Problems," Proc. Workshop New Security Paradigms (NSPW), pp. 13-22, 2001.
[45] M. Freedman, K. Nissim, and B. Pinkas, "Efficient Private Matching and Set Intersection," Proc. EUROCRYPT Advances in Cryptology, 2004.
[46] M.J. Atallah and J. Li, "Secure Outsourcing of Sequence Comparisons," Int'l J. Information Security, vol. 4, no. 4, pp. 277-287, 2005.
[47] D.S. Hirschberg, "A Linear Space Algorithm for Computing Maximal Common Subsequences," Comm. ACM, vol. 18, no. 6, pp. 341-343, 1975.
[48] C.S. Iliopoulos and M.S. Rahman, "Algorithms for Computing Variants of the Longest Common Subsequence Problem," Theoretical Computer Science, vol. 395, nos. 2/3, pp. 255-267, 2008.
[49] J.D. Ullman, A.V. Aho, and D.S. Hirschberg, "Bounds on the Complexity of the Longest Common Subsequence Problem," J. ACM, vol. 23, no. 1, pp. 1-12, 1976.
[50] J.R. Troncoso-Pastoriza, S. Katzenbeisser, and M. Celik, "Privacy Preserving Error Resilient DNA Searching through Oblivious Automata," Proc. 14th ACM Conf. Computer and Comm. Security (CCS), pp. 519-528, 2007.
[51] A. Amirbekyan and V. Estivill-Castro, "A New Efficient Privacy-Preserving Scalar Product Protocol," Proc. Sixth Australasian Conf. Data Mining and Analytics (AusDM '07), pp. 209-214, 2007.
[52] J. Vaidya and C. Clifton, "Secure Set Intersection Cardinality with Application to Association Rule Mining," J. Computer Security, vol. 13, no. 4, pp. 593-622, 2005.
[53] Y. Sang and H. Shen, "Privacy Preserving Set Intersection Based on Bilinear Groups," Proc. 31st Australasian Conf. Computer Science (ACSC), pp. 47-54, 2008.
[54] O. Goldreich, S. Micali, and A. Wigderson, "How to Play Any Mental Game," Proc. 19th Ann. ACM Symp. Theory of Computing (STOC '87), pp. 218-229, 1987.
[55] W. Du and M.J. Atallah, "Protocols for Secure Remote Database Access with Approximate Matching," E-Commerce Security and Privacy: Advances in Information Security, A.K. Ghosh, ed., vol. 2, pp. 87-112, Kluwer Academic Publishers, 2001.
[56] L. Backstrom, C. Dwork, and J. Kleinberg, "Wherefore Art Thou r3579x?: Anonymized Social Networks, Hidden Patterns, and Structural Steganography," Proc. 16th Int'l Conf. World Wide Web (WWW '07), pp. 181-190, 2007.
[57] A. Narayanan and V. Shmatikov, "Robust De-anonymization of Large Sparse Datasets," Proc. IEEE Symp. Security and Privacy (SP '08), pp. 111-125, 2008.
[58] R. Dorfman, "The Detection of Defective Members of Large Populations," Annals of Math. Statistics, vol. 14, pp. 436-440, 1943.
[59] D.-Z. Du and F.K. Hwang, Combinatorial Group Testing and Its Applications, second ed. World Scientific, 2000.
[60] D. Eppstein, M.T. Goodrich, and D.S. Hirschberg, "Improved Combinatorial Group Testing for Real-World Problem Sizes," Proc. Workshop Algorithms and Data Structures (WADS), 2005.
[61] M. Ruszinkó, "On the Upper Bound of the Size of the $r$ -Cover-Free Families," J. Combinatorial Theory Series A, vol. 66, pp. 302-310, 1994.
[62] D. Knuth, The Art of Computer Programming. Addison-Wesley, 1973.
[63] J. Leskovec, D. Huttenlocher, and J. Kleinberg, "Signed Networks in Social Media," Proc. 28th ACM Conf. Human Factors in Computing Systems, 2010.
[64] M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou, "Walking in Facebook: A Case Study of Unbiased Sampling of OSNs," Proc. IEEE INFOCOM, pp. 1-9, 2010.
[65] A.L. Traud, E.D. Kelsic, P.J. Mucha, and M.A. Porter, "Community Structure in Online Collegiate Social Networks," arXiv:0809.0960, 2008.
[66] A.U. Asuncion and M.T. Goodrich, "Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships and Other Sensitive Binary Attribute Vectors," Proc. Ninth Ann. ACM Workshop Privacy in the Electronic Soc. (WPES '10), pp. 21-30, 2010.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool