This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Query Expansion by Mining User Logs
July/August 2003 (vol. 15 no. 4)
pp. 829-839

Abstract—Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to previous query expansion methods, ours takes advantage of the user judgments implied in user logs. The experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods.

[1] M.J. Bates, Search Techniques. Ann. Rev. of Information Science and Technology, M.E. Williams, ed., pp. 139-169, 1981.
[2] D. Beeferman and A. Berger, Agglomerative Clustering of a Search Engine Query Log Proc. SIGKDD, pp. 407-416, 2000.
[3] G. Brajnik, S. Mizzaro, and C. Tasso, Evaluating User Interfaces to Information Retrieval Systems: A Case Study on User Support Proc. 19th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR'96), pp. 128-136, Aug. 1996.
[4] C. Buckley, G. Salton, and J. Allan, Automatic Retrieval with Locality Information Using Smart Proc. First Text Retrieval Conf. (TREC-1), pp. 59-72, 1992.
[5] C. Buckley, M. Mitra, J. Walz, and C. Cardie, Using Clustering and Superconcepts within Smart Proc. Sixth Text Retrieval Conf. (TREC-6), E. Voorhees, ed., pp. 107-124, 1998.
[6] C. Buckley, G. Salton, J. Allan, and A. Singhal, Automatic Query Expansion Using SMART Overview of the Third Retrieval Conf. (TREC-3), pp. 69-80, Nov. 1994.
[7] C. Carpineto, G. Romano, and B. Bigi, An Information-Theoretic Approach to Automatic Query Expansion ACM Trans. Information Systems, vol. 19, no. 1, pp. 1-27, Jan. 2001.
[8] J.W. Cooper and R.J. Byrd, Lexical Navigation: Visually Prompted Query Expansion and Refinement Proc. Second ACM Int'l Conf. Digital Libraries, pp. 237-246, 1997.
[9] W.B. Croft, R. Cook, and D. Wilder, Providing Government Information on the Internet: Experiences with THOMAS Proc. Second Int'l Conf. Theory and Practice of Digital Libraries, pp. 19-24, 1995.
[10] C.J. Crouch and B. Yang, Experiments in Automatic Statistical Thesaurus Construction Proc. ACM-SIGIR Conf. Research and Development in Information Retrieval, pp. 77-88, 1992.
[11] H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, Probabilistic Query Expansion Using User Logs Proc. 11th World Wide Web Conf., pp. 325-332, 2002.
[12] S. Deerwster, S.T. Dumai, G.W. Furnas, T.K. Landauer, and R. Harshman, Indexing by Latent Semantic Analysis J. Am. Soc. Information Science and Technology, vol. 41, no. 6, pp. 391-407, 1990.
[13] E. Efthimiadis and P. Biron, UCLA-Okapi at TREC-2: Query Expansion Experiments Proc. Second Text Retrieval Conf. (TREC-2), D.K. Harmon, ed., 1994.
[14] D. Evans and R. Lefferts, Design and Evaluation of the CLARIT-TREC-2 System Proc. Second Text Retrieval Conf. (TREC-2), 1994.
[15] G.W. Furnas, T.K. Landauer, L.M. Gomez, and S.T. Dumais, THE Vocabulary Problem in Human-System Communication Comm. ACM, vol. 30, no. 11, pp. 964-971, 1987.
[16] G. Grefenstette, Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, 1994.
[17] S.P. Harter, Online Information Retrieval: Concepts, Principles, and Techniques. Orlando, Fla.: Academic Press, 1986.
[18] D. Hull, Using Statistical Testing in the Evaluation of Retrieval Experiments Proc. ACM SIGIR, pp. 329-338, June 1993.
[19] Y. Jing and W.B. Croft, An Association Thesaurus for Information Retrieval Proc. RIAO, pp. 146-160, 1994.
[20] M.E. Lesk, Word-Word Associations In Document Retrieval Systems Am. Documentation, vol. 20, no. 1, pp. 27-38, 1969.
[21] A. Lu, M. Ayoub, and J. Dong, Ad Hoc Experiments Using EUREKA Proc. Text Retrieval Conf. (TREC-5), pp. 229-240, 1997.
[22] G. Miller, Wordnet: An Online Lexical Database Int'l J. Lexicography, vol. 3, no. 4, 1990.
[23] M. Mitra, A. Singhal, and C. Buckley, Improving Automatic Query Expansion Proc. 21st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 206-214, 1998.
[24] Y. Qiu and H. Frei, Concept Based Query Expansion Proc. 16th Int'l ACM SIGIR Conf. R&D in Information Retrieval, pp. 160-169, 1993.
[25] S.E. Robertson and K. Sparck Jones, Relevance Weighting of Search Terms J. Am. Soc. for Information Sciences, vol. 27, no. 3, pp. 129-146, 1976.
[26] S.E. Robertson, S. Walker, and M. Sparck Jones, et al., Okapi at TREC-3 Proc. Second Text Retrieval Conf. (TREC-3), 1995.
[27] J. Rocchio, Relevance Feedback in Information Retrieval The Smart Retrieval System Experiments in Automatic Document Processing, G. Salton, ed., pp. 313-323, 1971.
[28] R. Baeza-Yates, and B. Ribeiro-Neto, Modern Information Retrieval. England: Pearson Education Limited, 1999.
[29] T. Sakai, S.E. Robertson, and S. Walker, Flexible Pseudo-Relevance Feedback Via Direct Mapping and Categorization of Search Requests Proc. BCS-IRSG ECIR, pp. 3-14, 2001.
[30] G. Salton, The SMART Retrieval System Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice Hall, 1971.
[31] G. Salton and C. Buckley, Improving Retrieval Performance by Relevance Feedback J. Am. Soc. for Information Science, vol. 41, no. 4, pp. 288-297, 1990.
[32] K. Sparck Jones, Automatic Keyword Classification for Information Retrieval. London: Butterworths, 1971.
[33] E.M. Voorhees, Query Expansion Using Lexical-Semantic Relations Proc. 17th Int'l Conf. Research and Development in Information Retrieval, pp. 61-69, 1994.
[34] J.-R. Wen, J.-Y. Nie, and H.-J. Zhang, Query Clustering Using User Logs ACM Trans. Information Systems, vol. 20, no. 1, pp. 59-81, 2002.
[35] S.K. Wong and W. Ziarko et al., On Modeling of Information Retrieval Concepts in Vector Spaces ACM Trans. Database Systems, vol. 12, no. 2, pp. 299-321, June 1987.
[36] S.K.M. Wong, and Y.Y. Yao, A Probabilistic Method for Computing Term-by-Term Relationships J. Am. Soc. for Information Science, vol. 44, no. 8, pp. 431-439, 1993.
[37] J. Xu and W.B. Croft, Query Expansion Using Local and Global Document Analysis Proc. 19th Int'l Conf. Research and Development in Information Retrieval, pp. 4-11, 1996.
[38] J. Xu and W.B. Croft, Improving the Effectiveness of Information Retrieval with Local Context Analysis ACM Trans. Information Systems, vol. 18, no. 1, pp. 79-112, Jan. 2000.

Index Terms:
Query expansion, user log, probabilistic model, information retrieval, search engine.
Citation:
Hang Cui, Ji-Rong Wen, Jian-Yun Nie, Wei-Ying Ma, "Query Expansion by Mining User Logs," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 829-839, July-Aug. 2003, doi:10.1109/TKDE.2003.1209002
Usage of this product signifies your acceptance of the Terms of Use.