The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2012 vol.24)
pp: 520-532
Michaela Götz , Cornell University, Ithaca
Ashwin Machanavajjhala , Yahoo! Research, Silicon Valley
Guozhang Wang , Cornell University, Ithaca
Xiaokui Xiao , Nanyang Technological University, Singapore
Johannes Gehrke , Cornell University, Ithaca
ABSTRACT
Search engine companies collect the “database of intentions,” the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper, we analyze algorithms for publishing frequent keywords, queries, and clicks of a search log. We first show how methods that achieve variants of k-anonymity are vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured by ε-differential privacy unfortunately does not provide any utility for this problem. We then propose an algorithm ZEALOUS and show how to set its parameters to achieve (ε,δ )-probabilistic privacy. We also contrast our analysis of ZEALOUS with an analysis by Korolova et al. [17] that achieves (ε′,δ′)-indistinguishability. Our paper concludes with a large experimental study using real applications where we compare ZEALOUS and previous work that achieves k-anonymity in search log publishing. Our results show that ZEALOUS yields comparable utility to k-anonymity while at the same time achieving much stronger privacy guarantees.
INDEX TERMS
Security, integrity, and protection, general, database management, information technology and systems, web search, general, information storage and retrieval, information technology and systems.
CITATION
Michaela Götz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, Johannes Gehrke, "Publishing Search Logs—A Comparative Study of Privacy Guarantees", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 3, pp. 520-532, March 2012, doi:10.1109/TKDE.2011.26
REFERENCES
[1] E. Adar, "User 4xxxxx9: Anonymizing Query Logs," Proc. World Wide Web (WWW) Workshop Query Log Analysis, 2007.
[2] R. Baeza-Yates, "Web Usage Mining in Search Engines," Web Mining: Applications and Techniques, Idea Group, 2004.
[3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, "Privacy Accuracy and Consistency Too: A Holistic Solution to Contingency Table Release," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2007.
[4] M. Barbaro and T. Zeller, "A Face is Exposed for AOL Searcher No. 4417749," New York Times, http://www.nytimes.com/2006/08/09/technology 09aol.html?ex=1312776000en= f6f61949c6da4d38ei=5090 , 2006.
[5] A. Blum, K. Ligett, and A. Roth, "A Learning Theory Approach to Non-Interactive Database Privacy," Proc. 40th Ann. ACM Symp. Theory of Computing (STOC), pp. 609-618, 2008.
[6] J. Brickell and V. Shmatikov, "The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2008.
[7] S. Chakrabarti, R. Khanna, U. Sawant, and C. Bhattacharyya, "Structured Learning for Non-Smooth Ranking Losses," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 88-96, 2008.
[8] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, "Our Data Ourselves: Privacy via Distributed Noise Generation," Proc. Ann. Int'l Conf. Theory and Applications of Cryptographic Techniques (EUROCRYPT), 2006.
[9] C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating Noise to Sensitivity in Private Data Analysis," Proc. Theory of Cryptography Conf. (TCC), 2006.
[10] M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke, "Privacy in Search Logs," CoRR, abs/0904.0682v2, 2009.
[11] J. Han and M. Kamber, Data Mining: Concepts and Techniques, first ed. Morgan Kaufmann, Sept. 2000.
[12] Y. He and J.F. Naughton, "Anonymization of Set-Valued Data via Top-Down, Local Generalization," Proc. VLDB Endowment, vol. 2, no. 1, pp. 934-945, 2009.
[13] Y. Hong, X. He, J. Vaidya, N. Adam, and V. Atluri, "Effective Anonymization of Query Logs," Proc. ACM Conf. Information and Knowledge Management (CIKM), 2009.
[14] R. Jones, R. Kumar, B. Pang, and A. Tomkins, "I Know What You Did Last Summer: Query Logs and User Privacy," Proc. ACM Conf. Information and Knowledge Management (CIKM), 2007.
[15] R. Jones, B. Rey, O. Madani, and W. Greiner, "Generating Query Substitutions," Proc. 15th Int'l Conf. World Wide Web (WWW), 2006.
[16] S. Prasad Kasiviswanathan, H.K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, "What Can We Learn Privately?" Proc. 49th Ann. IEEE Symp. Foundation of Computer Science (FOCS), pp. 531-540, 2008.
[17] A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas, "Releasing Search Queries and Clicks Privately," Proc. 18th Int'l Conf. World Wide Web (WWW), 2009.
[18] R. Kumar, J. Novak, B. Pang, and A. Tomkins, "On Anonymizing Query Logs via Token-Based Hashing," Proc. Int'l Conf. World Wide Web (WWW), 2007.
[19] Y. Luo, Y. Zhao, and J. Le, "A Survey on the Privacy Preserving Algorithm of Association Rule Mining," Proc. Int'l Symp. Electronic Commerce and Security, vol. 1, pp. 241-245, 2009.
[20] A. Machanavajjhala, D. Kifer, J.M. Abowd, J. Gehrke, and L. Vilhuber, "Privacy: Theory Meets Practice on the Map," Proc. Int'l Conf. Data Eng. (ICDE), 2008.
[21] R. Motwani and S. Nabar, "Anonymizing Unstructured Data," Corr, abs/0810.5582, 2008.
[22] K. Nissim, S. Raskhodnikova, and A. Smith, "Smooth Sensitivity and Sampling in Private Data Analysis," Proc. Ann. ACM Symp. Theory of Computing (STOC), 2007.
[23] P. Samarati, "Protecting Respondents' Identities in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool