The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.24)
pp: 2260-2273
Pawan Goyal , University of Ulster, Londonderry
Laxmidhar Behera , Indian Institute of Technology, Kanpur and University of Ulster, Londonderry
Thomas Martin McGinnity , University of Ulster, Londonderry
ABSTRACT
A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25), and Language Model (LM). Experiments over various data sets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well-known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state of the art in global query expansion.
INDEX TERMS
Mathematical model, Equations, Correlation, Information retrieval, Context, Markov processes, Indexes, language model, Information retrieval, lexical association, query expansion
CITATION
Pawan Goyal, Laxmidhar Behera, Thomas Martin McGinnity, "Query Representation through Lexical Association for Information Retrieval", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2260-2273, Dec. 2012, doi:10.1109/TKDE.2011.171
REFERENCES
[1] G. Salton, A. Wong, and C.S. Yang, "A Vector Space Model for Automatic Indexing," Comm. ACM, vol. 18, no. 11, pp. 613-620, 1975.
[2] S.E. Robertson, C.J. van Rijsbergen, and M.F. Porter, "Probabilistic Models of Indexing and Searching," Proc. Third Ann. ACM Conf. Research and Development in Information Retrieval (SIGIR '80), pp. 35-56, 1981.
[3] H. Turtle and W.B. Croft, "Inference Networks for Document Retrieval," Proc. 13th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 1-24, 1990.
[4] T. Kalt, "A New Probabilistic Model of Text Classification and Retrieval Title2:," technical report, 1998.
[5] J.M. Ponte and W.B. Croft, "A Language Modeling Approach to Information Retrieval," Proc. 21st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '98), pp. 275-281, 1998.
[6] B.-H. Cho, C. Lee, and G.G. Lee, "Exploring Term Dependences in Probabilistic Information Retrieval Model," Information Processing Management, vol. 39, no. 4, pp. 505-519, 2003.
[7] D. Downey, S. Dumais, D. Liebling, and E. Horvitz, "Understanding the Relationship between Searchers' Queries and Information Goals," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM '08), pp. 449-458, 2008.
[8] B.J. Jansen, A. Spink, and T. Saracevic, "Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web," Information Processing and Management, vol. 36, no. 2, pp. 207-227, 2000.
[9] G.W. Furnas, T.K. Landauer, L.M. Gomez, and S.T. Dumais, "The Vocabulary Problem in Human-System Communication," Comm. ACM, vol. 30, no. 11, pp. 964-971, 1987.
[10] T. Custis and K. Al-Kofahi, "A New Approach for Evaluating Query Expansion: Query-Document Term Mismatch," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 575-582, 2007.
[11] S.-H. Na, I.-S. Kang, J.-E. Roh, and J.-H. Lee, "An Empirical Study of Query Expansion and Cluster-Based Retrieval in Language Modeling Approach," Information Processing Managent, vol. 43, no. 2, pp. 302-314, 2007.
[12] J. Rocchio, Relevance Feedback in Information Retrieval, pp. 313-323, VDM Verlag, 1971.
[13] I. Ruthven, "Re-Examining the Potential Effectiveness of Interactive Query Expansion," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '03), pp. 213-220, 2003.
[14] P. Anick, "Using Terminological Feedback for Web Search Refinement: A Log-Based Study," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Informaion Retrieval (SIGIR '03), pp. 88-95, 2003.
[15] J. Xu and W.B. Croft, "Improving the Effectiveness of Information Retrieval with Local Context Analysis," ACM Trans. Information Systems, vol. 18, no. 1, pp. 79-112, 2000.
[16] M. Okabe, K. Umemura, and S. Yamada, "Query Expansion with the Minimum User Feedback by Transductive Learning," Proc. Conf. Human Language Technology and Empirical Methods in Natural Language Processing (HLT '05), pp. 963-970, 2005.
[17] V. Lavrenko and W.B. Croft, "Relevance Based Language Models," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 120-127, 2001.
[18] C. Zhai and J. Lafferty, "Model-Based Feedback in the Language Modeling Approach to Information Retrieval," Proc. 10th Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 403-410, 2001.
[19] S.P. Harter, "Psychological Relevance and Information Science," J. Am. Soc. Information Science, vol. 43, no. 9, pp. 602-615, 1992.
[20] T. Saracevic, "Saracevic, T. (1996). Relevance Reconsidered. Information Science: Integration in Perspectives." Proc. Second Conf. Conceptions of Library and Information Science, pp. 201-218, 1996.
[21] A. Schutz and R. Zaner, Reflections on the Problem of Relevance. Yale Univ. Press, 1970.
[22] T. Saracevic, "Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part 2: Nature and Manifestations of Relevance," J. Am. Soc. for Information Science and Technology, vol. 58, no. 13, pp. 1915-1933, 2007.
[23] D. Swanson, "Subjective Versus Objective Relevance in Bibliographic Retrieval Systems," The Library Quarterly, vol. 56, no. 4, pp. 389-398, 1986.
[24] T. Park, "The Nature of Relevance in Information Retrieval: An Empirical Study," The Library Quarterly, vol. 63, pp. 318-351, 1993.
[25] W. Cooper, "A Definition of Relevance for Information Retrieval ∗ 1," Information Storage and Retrieval, vol. 7, no. 1, pp. 19-37, 1971.
[26] P. Wilson, "Situational Relevance," Information Storage and Retrieval, vol. 9, no. 8, pp. 457-471, 1973.
[27] C.L. Barry, "Document Representations and Clues to Document Relevance," J. Am. Soc. Information Science, vol. 49, pp. 1293-1303, Dec. 1998.
[28] P. Wang, M.W. Berry, and Y. Yang, "Mining Longitudinal Web Queries: Trends and Patterns," J. Am. Soc. Information Science and Technology, vol. 54, no. 8, pp. 743-758, 2003.
[29] C.J. Crouch, "An Approach to the Automatic Construction of Global Thesauri," Information Processing Management, vol. 26, no. 5, pp. 629-640, 1990.
[30] Y. Qiu and H.-P. Frei, "Concept Based Query Expansion," Proc. 16th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '93), pp. 160-169, 1993.
[31] J. Lafferty and C. Zhai, "Document Language Models, Query Models, and Risk Minimization for Information Retrieval," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 111-119, 2001.
[32] J. Bai and J.-Y. Nie, "Adapting Information Retrieval to Query Contexts," Information Processing and Management, vol. 44, no. 6, pp. 1901-1922, 2008.
[33] H. Schütze and J.O. Pedersen, "A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval," Information Processing and Management, vol. 33, no. 3, pp. 307-318, 1997.
[34] D. Song and P.D. Bruza, "Towards Context Sensitive Information Inference," J. Am. Soc. Information Science and Technology, vol. 54, no. 4, pp. 321-334, 2003.
[35] J. Bai, D. Song, P. Bruza, J.-Y. Nie, and G. Cao, "Query Expansion Using Term Relationships in Language Models for Information Retrieval," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM '05), pp. 688-695, 2005.
[36] L.A.F. Park and K. Ramamohanarao, "An Analysis of Latent Semantic Term Self-Correlation," ACM Trans. Information Systems, vol. 27, no. 2, pp. 1-35, 2009.
[37] E.M. Voorhees, "Query Expansion Using Lexical-Semantic Relations," Proc. 17th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '94), pp. 61-69, 1994.
[38] G. Salton, Automatic Information Organization and Retrieval. McGraw Hill Text, 1968.
[39] G. Cao, J.-Y. Nie, and J. Bai, "Integrating Word Relationships Into Language Models," Proc. 28th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '05), pp. 298-305, 2005.
[40] M.-H. Hsu, M.-F. Tsai, and H.-H. Chen, "Combining Wordnet and Conceptnet for Automatic Query Expansion: A Learning Approach," Proc. Fourth Asia Information Retrieval Conf. Information Retrieval Technology (AIRS '08), pp. 213-224, 2008.
[41] J. Zhang, B. Deng, and X. Li, "Concept Based Query Expansion Using Wordnet," Proc. Int'l e-Conf. Advanced Science and Technology (AST '09), pp. 52-55, 2009.
[42] F.J. Pinto, A.F. Martinez, and C.F. Perez-Sanjulian, "Joining Automatic Query Expansion Based on Thesaurus and Word Sense Disambiguation Using Wordnet," Int'l J. Computer Applications in Technology, vol. 33, no. 4, pp. 271-279, 2009.
[43] J. Bhogal, A. Macfarlane, and P. Smith, "A Review of Ontology Based Query Expansion," Information Processing and Management, vol. 43, no. 4, pp. 866-886, 2007.
[44] C. Buckley, G. Salton, J. Allan, and A. Singhal, "Automatic Query Expansion Using Smart: TREC 3," Proc. TREC, 1994.
[45] C. Carpineto, R. de Mori, G. Romano, and B. Bigi, "An Information-Theoretic Approach to Automatic Query Expansion," ACM Trans. Information Systems, vol. 19, no. 1, pp. 1-27, 2001.
[46] D. Metzler and W.B. Croft, "Latent Concept Expansion Using Markov Random Fields," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 311-318, 2007.
[47] K. Collins-Thompson and J. Callan, "Estimation and Use of Uncertainty in Pseudo-Relevance Feedback," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 303-310, 2007.
[48] G. Cao, J.-Y. Nie, J. Gao, and S. Robertson, "Selecting Good Expansion Terms for Pseudo-Relevance Feedback," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 243-250, 2008.
[49] S.K.M. Wong, W. Ziarko, and P.C.N. Wong, "Generalized Vector Spaces Model in Information Retrieval," Proc. Eighth Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '85), pp. 18-25, 1985.
[50] J. Benesty, J. Chen, Y. Huang, and I. Cohen, "Pearson Correlation Coefficient," Noise Reduction in Speech Processing, pp. 1-4, 2009.
[51] E.M. Voorhees and D.K. Harman, "Overview of the Sixth Text Retrieval Conference (Trec-6)," Proc. Sixth Text Retrieval Conf. (TREC-6), pp. 83-91, 1998.
[52] D. Wollersheim and J. Rahayu, "Ontology Based Query Expansion Framework for Use in Medical Information Systems," Int'l J. Web Information Systems, vol. 1, no. 2, pp. 101-115, 2005.
[53] R. Navigli and P. Velardi, "An Analysis of Ontology-Based Query Expansion Strategies," Proc. Workshop Adaptive Text Extraction and Mining, pp. 42-49, 2003.
[54] M. Song, I.-Y. Song, X. Hu, and R.B. Allen, "Integration of Association Rules and Ontologies for Semantic Query Expansion," Data Knowledge Eng., vol. 63, pp. 63-75, Oct. 2007.
[55] K.S. Jones, S. Walker, and S.E. Robertson, "A Probabilistic Model of Information Retrieval: Development and Comparative Experiments," Information Processing and Management vol. 36, no. 6, pp. 779-808, 2000.
[56] C. Zhai and J. Lafferty, "A Study of Smoothing Methods for Language Models Applied to Information Retrieval," ACM Trans. Information System, vol. 22, no. 2, pp. 179-214, 2004.
[57] D.J.C. Mackay and L. Peto, "A Hierarchical Dirichlet Language Model," Natural Language Eng., vol. 1, no. 3, pp. 1-19, 1994.
[58] F. Jelinek and R. Mercer, "Interpolated Estimation of Markov Source Parameters from Sparse Data," Pattern Recognition in Practice, pp. 381-402, 1980.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool