Issue No. 10 - Oct. (2015 vol. 27)
Peipei Li , School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Haixun Wang , Google Research, Mountain View, CA 94043 USA
Kenny Q. Zhu , Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Zhongyuan Wang , , Renmin University of China and Microsoft Research Asia, Beijing, China
Xuegang Hu , School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, Anhui Province, China
Xindong Wu , School of Computer Science and Information Engineering, Hefei University of Technology, Burlington, VT 05405, China
Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.
Semantics, Context, Companies, Clustering algorithms, Google, Taxonomy, Knowledge based systems
P. Li, H. Wang, K. Q. Zhu, Z. Wang, X. Hu and X. Wu, "A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity," in IEEE Transactions on Knowledge & Data Engineering, vol. 27, no. 10, pp. 2604-2617, 2015.