This Article 
 Bibliographic References 
 Add to: 
2009 WRI World Congress on Computer Science and Information Engineering
Ensemble Similarity Measures for Clustering Terms
Los Angeles, California USA
March 31-April 02
ISBN: 978-0-7695-3507-4
Clustering semantically related terms is crucial for many applications such as document categorization, and word sense disambiguation. However, automatically identifying semantically similar terms is challenging. We present a novel approach for automatically determining the degree of relatedness between terms to facilitate their subsequent clustering. Using the analogy of ensemble classifiers in Machine Learning, we combine multiple techniques like contextual similarity and semantic relatedness to boost the accuracy of our computations. A new method, based on Yarowsky’s word sense disambiguation approach, to generate high-quality topic signatures for contextual similarity computations, is presented. A technique to measure semantic relatedness between multi-word terms, based on the work of Hirst and St. Onge is also proposed. Experimental evaluation reveals that our method outperforms similar related works. We also investigate the effects of assigning different importance levels to the different similarity measures based on the corpus characteristics.
Index Terms:
text clustering, semantic relatetdness, natural language processing
Ashwin Ittoo, Laura Maruster, "Ensemble Similarity Measures for Clustering Terms," csie, vol. 4, pp.315-319, 2009 WRI World Congress on Computer Science and Information Engineering, 2009
Usage of this product signifies your acceptance of the Terms of Use.