Issue No. 09 - September (2010 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.134
Di Cai , University of Wolverhampton, Wolverhampton
Hitherto, it has not been easy to interpret the meaning of the amount of discrimination information conveyed in a term rationally and explicitly within practical application contexts; it has not been simple to introduce the concept of the extent of semantic relatedness between two terms meaningfully and successfully into scientific discussions. This study is part of an attempt to do this. We attempt to answer two important questions: 1) What is the discrimination information conveyed by a term and how to measure it? 2) What is the relatedness between two terms and how to estimate it? We focus on the first question and present an in-depth investigation into the discrimination measures based on several information measures, which are widely used in a variety of applications. The relatedness measures are then naturally defined according to the individual discrimination measures. Some key points are made for clarifying potential problems arising from using the relatedness measures, and solutions are suggested. Two example applications in the contexts of text mining and information retrieval are provided. The aim of this study, of which this paper forms part, is to establish a unified theoretical framework, with measurement of discrimination information (MDI) at the core, for achieving effective measurement of semantic relatedness (MSR). Due to its generality, our method can be expected to be a useful tool with a wide range of application areas.
Statistical semantic analysis, measurement of discrimination information, measurement of semantic relatedness, informative term identification, key term extraction, text mining, information retrieval.
D. Cai, "An Information-Theoretic Foundation for the Measurement of Discrimination Information," in IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 1262-1273, 2009.