This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Topic Signature Language Models for Ad hoc Retrieval
September 2007 (vol. 19 no. 9)
pp. 1276-1287
Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. The previously implemented semantic smoothing models, such as the translation model, have shown good experimental results. However, these models are unable to incorporate contextual information. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document into a set of weighted context-sensitive topic signatures and then translate those topic signatures into query terms. The language model with such a context-sensitive semantic smoothing is referred to as the topic signature language model. In detail, we implement two types of topic signatures depending on whether ontology exists in the application domain. One is the ontology-based concept and the other the multiword phrase. The translation probabilities from each topic signature to individual terms are estimated through the EM algorithm. Document models based on topic signature translation are then derived. The new smoothing method is evaluated on TREC 2004/2005 Genomics Track with ontology-based concepts, and TREC Ad hoc Track (Disk 1, 2 and 3) with multiword phrases. Both experiments show significant improvements over the two-stage language model as well as the language model with context-insensitive semantic smoothing.

[1] J. Bai, J.Y. Nie, and G. Cao, “Context-Dependent Term Relations for Information Retrieval,” Proc. Empirical Methods in Natural Language Processing (EMNLP '06), July 2006.
[2] A. Berger and J. Lafferty, “Information Retrieval as Statistical Translation,” Proc. 22nd Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '99), pp. 222-229, 1999.
[3] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J.Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[4] G. Cao, J.Y. Nie, and J. Bai, “Integrating Word Relationships into Language Models,” Proc. 28th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '05), pp. 298-305, 2005.
[5] W.B. Croft, H.R. Turtle, and D.D. Lewis, “The Use of Phrases and Structured Queries in Information Retrieval,” Proc. 14th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '91), pp. 32-45, 1991.
[6] H. Cunningham, “GATE: A General Architecture for Text Engineering,” Computers and the Humanities, vol. 36, pp. 223-254, 2002.
[7] S. Deerwester, T.S. Dumais, W.G. Furnas, K.T. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[8] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 39, pp. 1-38, 1977.
[9] J. Fagan, “Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods,” PhD dissertation, Technical Report 87-868, Computer Science Dept., Cornell Univ., 1987.
[10] S. Harabagiu and F. Lacatusu, “Topic Themes for Multi-Document Summarization,” Proc. 28th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '05), pp. 42-48, 2005.
[11] W. Hersh et al., “TREC 2004 Genomics Track Overview,” Proc. 13th Text Retrieval Conf. (TREC '04), 2004.
[12] W. Hersh et al., “TREC 2005 Genomics Track Overview,” Proc. 14th Text Retrieval Conf. (TREC '05), 2005.
[13] T. Hoffman, “Probabilistic Latent Semantic Indexing,” Proc. 22nd Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '99), pp. 50-57, 1999.
[14] R. Jin, A. Hauptmann, and C. Zhai, “Title Language Model for Information Retrieval,” Proc. 25th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '02), pp. 42-48, 2002.
[15] J. Lafferty and C. Zhai, “Document Language Models, Query Models, and Risk Minimization for Information Retrieval,” Proc. 24th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 111-119, 2001.
[16] X. Liu and W.B. Croft, “Cluster-Based Retrieval Using Language Models,” Proc. 24th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 186-193, 2001.
[17] D. Miller, T. Leek, and M.R. Schwartz, “A Hidden Markov Model Information Retrieval System,” Proc. 22nd Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '99), pp.214-221, 1999.
[18] G.A. Miller, “WordNet: A Lexical Database for English,” Comm. ACM, vol. 38, no. 11, pp. 39-41, 1995.
[19] R.J. Mooney and R. Bunescu, “Mining Knowledge from Text Using Information Extraction,” SIGKDD Explorations, special issue on text mining and natural language processing, vol. 7, no. 1, pp.3-10, 2005.
[20] J. Pickens and W.B. Croft, “An Exploratory Analysis of Phrases in Text Retrieval,” Proc. Recherche d'Information Assistée par Ordinateur Conf. (RIAO '00), pp. 1179-1195, 2000.
[21] J. Ponte and W.B. Croft, “A Language Modeling Approach to Information Retrieval,” Proc. 21st Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '98), pp. 275-281, 1998.
[22] S.E. Robertson et al., “Okapi at TREC-4,” Proc. Fourth Text Retrieval Conf. (TREC '95), 1995.
[23] F. Smadja, “Retrieving Collocations from Text: Xtract,” Computational Linguistics, vol. 19, no. 1, pp. 143-177, 1993.
[24] D. Song and P.D. Bruza, “Towards Context-Sensitive Information Inference,” J. Am. Soc. Information Science and Technology, vol. 54, pp. 321-334, 2003.
[25] X. Wei and W.B. Croft, “LDA-Based Document Models for Ad-Hoc Retrieval,” Proc. 29th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '06), pp. 178-185, 2006.
[26] C. Zhai and J. Lafferty, “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval,” Proc. 24th Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 334-342, 2001.
[27] C. Zhai and J. Lafferty, “Model-Based Feedback in the Language Modeling Approach to Information Retrieval,” Proc. 10th Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 403-410, 2001.
[28] C. Zhai and J. Lafferty, “Two-Stage Language Models for Information Retrieval,” Proc. ACM Conf. Research and Development in Information Retrieval (SIGIR '02), 2002.
[29] X. Zhou, X. Hu, X. Lin, H. Han, and X. Zhang, “Relation-Based Document Retrieval for Biomedical Literature Databases,” Proc. 11th Int'l Conf. Database Systems for Advanced Applications (DASFAA '06), pp. 689-701, Apr. 2006.
[30] X. Zhou, X. Zhang, and X. Hu, “Using Concept-Based Indexing to Improve Language Modeling Approach to Genomic IR,” Proc. 28th European Conf. Information Retrieval (ECIR '06), pp. 444-455, Apr. 2006.
[31] X. Zhou, X. Zhang, and X. Hu, “MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup,” Proc. Ninth Biennial Pacific Rim Int'l Conf. Artificial Intelligence (PRICAI '06), pp.1145-1149, Aug. 2006.
[32] X. Zhou, X. Hu, X. Zhang, X. Lin, and I.-Y. Song, “Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR,” Proc. 29th Ann. Int'l ACM Conf. Research and Development on Information Retrieval (SIGIR '06), pp.70-77, Aug. 2006.
[33] X. Zhou, X. Zhang, and X. Hu, “Semantic Smoothing of Document Models for Agglomerative Clustering,” Proc. 20th Int'l Joint Conf. Artificial Intelligence (IJCAI '07), pp. 2928-2933, Jan. 2007.
[34] X. Zhou, X. Zhang, and X. Hu, “The Dragon Toolkit Developer Guide,” Data Mining and Bioinformatics Laboratory, Drexel Univ., http://www.dragontoolkit.orgtutorial.pdf , 2007.
[35] UMLS, http://www.nlm.nih.gov/researchumls/, 2007.
[36] GENIA Corpus, http://www-tsujii.is.s.u-tokyo.ac.jpGENIA /, 2007.

Index Terms:
Information Retrieval, Language Models, Semantic Smoothing, Topic Signature, Concept, Multiword Phrase
Citation:
Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, "Topic Signature Language Models for Ad hoc Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1276-1287, Sept. 2007, doi:10.1109/TKDE.2007.1058
Usage of this product signifies your acceptance of the Terms of Use.