This Article 
 Bibliographic References 
 Add to: 
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources
July/August 2003 (vol. 15 no. 4)
pp. 871-882

Abstract—Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.

[1] S. Abney and M. Light, Hiding a Semantic Class Hierarchy in a Markov Model Proc. ACL Workshop Unsupervised Learning in Natural Language Processing, pp. 1-8, 1999.
[2] E. Agirre and G. Rigau, Word Sense Disambiguation Using Conceptual Density Proc. 16th Int'l Conf. Computational Linguistics, 1996.
[3] C.M. Aguilar and D.L. Medin, Asymmetries of Comparison Psychonomic Bulletin&Rev., vol. 6, no. 2, pp. 328-337, 1999.
[4] A. Budanitsky and G. Hirst, Semantic Distance in WordNet: An Experimental, Application-Oriented Evaluation of Five Measures Proc. Workshop WordNet and Other Lexical Resources, Second Meeting North Am. Chapter Assoc. for Computational Linguistics, June 2001.
[5] A. Budanitsky, Lexical Semantic Relatedness and Its Application in Natural Language Processing Technical Report CSRG-390, Dept. of Computer Science, Univ. of Toronto, Aug. 1999.
[6] A.L. Edwards, An Introduction to Linear Regression and Correlation. San Francisco: W.H. Freeman, 1976.
[7] W.N. Francis and H. Kucera, Brown Corpus Manual Revised and Amplified Dept. of Linguistics, Brown Univ., Providence, R.I., 1979.
[8] S.J. Green, “Building Hypertext Links by Computing Semantic Similarity,” IEEE Trans. Knowledge and Data Eng., vol. 11, no. 5, pp. 713–730, 1999.
[9] J.J. Jiang and D.W. Conrath, Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy Proc. ROCLING X, 1997.
[10] H. Kozima, Computing Lexical Cohesion as a Tool for Text Analysis doctoral thesis, Computer Science and Information Math., Graduate School of Electro-Comm., Univ. of Electro-Comm., 1994.
[11] D. Lin, An Information-Theoretic Definition of Similarity Proc. Int'l Conf. Machine Learning, July 1998.
[12] M. McHale, A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity Proc. COLING/ACL Workshop Usage of WordNet in Natural Language Processing Systems, 1998.
[13] D.L. Medin, R.L. Goldstone, and D. Gentner, Respects for Similarity Psychological Rev., vol. 100, no. 2, pp. 254-278, 1993.
[14] G.A. Miller and W.G. Charles, Contextual Correlates of Semantic Similarity Language and Cognitive Processes, vol. 6, no. 1, pp. 1-28, 1991.
[15] G.A. Miller, WordNet: A Lexical Database for English Comm. ACM, vol. 38, no. 11, pp. 39-41, 1995.
[16] R. Rada,H. Mili,E. Bicknell,, and M. Blettner,“Development and application of a metric on semantic nets IEEE Trans. Systems, Man, and Cybernetics, Jan./Feb. 1989, vol. 19, no. 1, pp. 7-30.
[17] P. Resnik, Using Information Content to Evaluate Semantic Similarity in a Taxonomy Proc. 14th Int'l Joint Conf. Artificial Intelligence, 1995.
[18] P. Resnik, Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language J. Artificial Intelligence Research, vol. 11, pp. 95-130, 1999.
[19] R. Richardson, A.F. Smeaton, and J. Murphy, Using WordNet as a Knowledge Base for Measuring Semantic Similarity Working paper CA-1294, School of Computer Applications, Dublin City Univ., Dublin, 1994.
[20] H. Rubenstein and J.B. Goodenough, Contextual Correlates of Synonymy Comm. ACM, vol. 8, pp. 627-633, 1965.
[21] R.N. Shepard, Towards a Universal Law of Generalisation for Psychological Science Science, vol. 237, pp.1317-1323, 1987.
[22] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-Based Image Retrieval at the End of the Early Years IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[23] R.K. Srihari, Z.F. Zhang, and A.B. Rao, Intelligent Indexing and Semantic Retrieval of Multimodal Documents Information Retrieval, vol. 2, pp. 245-275, 2000.
[24] M. Sussna, Word Sense Disambiguation for Free-Text Indexing Using a Massive Semantic Network Proc. Second Int'l Conf. Information and Knowledge Management, 1993.
[25] A. Tversky, Features of Similarity Psychological Rev., vol. 84, no. 4, pp. 327-352, 1977.

Index Terms:
Semantic similarity, lexical database, information content, corpus statistics.
Yuhua Li, Zuhair A. Bandar, David McLean, "An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 871-882, July-Aug. 2003, doi:10.1109/TKDE.2003.1209005
Usage of this product signifies your acceptance of the Terms of Use.