Issue No. 09 - September (2007 vol. 19)
Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. The previously implemented semantic smoothing models, such as the translation model, have shown good experimental results. However, these models are unable to incorporate contextual information. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document into a set of weighted context-sensitive topic signatures and then translate those topic signatures into query terms. The language model with such a context-sensitive semantic smoothing is referred to as the topic signature language model. In detail, we implement two types of topic signatures depending on whether ontology exists in the application domain. One is the ontology-based concept and the other the multiword phrase. The translation probabilities from each topic signature to individual terms are estimated through the EM algorithm. Document models based on topic signature translation are then derived. The new smoothing method is evaluated on TREC 2004/2005 Genomics Track with ontology-based concepts, and TREC Ad hoc Track (Disk 1, 2 and 3) with multiword phrases. Both experiments show significant improvements over the two-stage language model as well as the language model with context-insensitive semantic smoothing.
Information Retrieval, Language Models, Semantic Smoothing, Topic Signature, Concept, Multiword Phrase
Xiaodan Zhang, Xiaohua Hu, Xiaohua Zhou, "Topic Signature Language Models for Ad hoc Retrieval", IEEE Transactions on Knowledge & Data Engineering, vol. 19, no. , pp. 1276-1287, September 2007, doi:10.1109/TKDE.2007.1058