This Article 
 Bibliographic References 
 Add to: 
Determining Semantic Similarity among Entity Classes from Different Ontologies
March/April 2003 (vol. 15 no. 2)
pp. 442-456
Max J. Egenhofer, IEEE Computer Society

Abstract—Semantic similarity measures play an important role in information retrieval and information integration. Traditional approaches to modeling semantic similarity compute the semantic distance between definitions within a single ontology. This single ontology is either a domain-independent ontology or the result of the integration of existing ontologies. We present an approach to computing semantic similarity that relaxes the requirement of a single ontology and accounts for differences in the levels of explicitness and formalization of the different ontology specifications. A similarity function determines similar entity classes by using a matching process over synonym sets, semantic neighborhoods, and distinguishing features that are classified into parts, functions, and attributes. Experimental results with different ontologies indicate that the model gives good results when ontologies have complete and detailed representations of entity classes. While the combination of word matching and semantic neighborhood matching is adequate for detecting equivalent entity classes, feature matching allows us to discriminate among similar, but not necessarily equivalent entity classes.

[1] A. Sheth, “Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics,” Interoperating Geographic Information Systems, M. Goodchild, M. Egenhofer, R. Fegeas, and C. Kottman, eds., pp. 5-30, 1999.
[2] A. Sheth and V. Kashyap, “So Far (Schematically) Yet So Near (Semantically),” Proc. Int'l Federation for Information Processing (IFIP) Conf Semantics of Interoperable Database Systems, Nov. 1993.
[3] N. Guarino, C. Masolo, and G. Verete, “OntoSeek: Content-Based Access to the Web,” IEEE Intelligent Systems, vol. 14, pp. 70-80, 1999.
[4] E. Voorhees, “Using WordNet for Text Retrieval,” WordNet: An Electronic Lexical Database, C. Fellbaum, ed., Cambridge, Mass.: The MIT Press, pp. 285-303. 1998.
[5] J. Jiang and D. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” Proc. Int'l Conf. Computational Linguistics (ROCLING X), 1997.
[6] A. Smeaton and I. Quigley, "Experiments on Using Semantic Distances between Words in Image Caption Retrieval," Proc. 19th Int'l Conf. Research and Development in Information Retrieval, ACM, New York, 1996, pp. 174-180.
[7] J. Lee, M. Kim, and Y. Lee, “Information Retrieval Based on Conceptual Distance in IS-A Hierarchies,” J. Documentation, vol. 49, pp. 188-207, 1993.
[8] A. Goñi, E. Mena, and A. Illarramendi, “Querying Heterogeneous and Distributed Data Repositories Using Ontologies,” Information Modelling and Knowledge Base IX, P.-J. Charrel and H. Jaakkola, eds., IOS Press, pp. 19-34, 1998.
[9] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller, “Introduction to WordNet: An On-Line Lexical Database,” Int'l J. Lexicography, vol. 3, pp. 235-244, 1990.
[10] K. Knight and S. Luk, "Building a Large-Scale Knowledge Base for Machine Translation," Proc. Am. Assoc. Artificial Intelligence, AAAI Press, Menlo Park, Calif., 1994.
[11] J. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations, to be published, PWS Publishing Company, Aug. 1999.
[12] A. Rector, W. Nowlan, and A. Glowinski, “Goals for Concept Representation in the GALEN Project,” Proc. 17th Ann. Symp. Computer Applications in Medical Care (SCAMC '93), 1993.
[13] B. Schaeffer and R. Wallace, “Semantic Similarity and the Comparison of Word Meanings,” J. Experiential Psychology, vol. 82, pp. 343-346, 1969.
[14] D. Lenat and R.V. Guha, Building Large Knowledge-Based Systems. Addison-Wesley, 1990.
[15] A. Tversky, “Features of Similarity,” Psychological Rev., vol. 84, pp. 327-352, 1977.
[16] N. Guarino, “Formal Ontology in Information Systems,” Formal Ontology in Information Systems, N. Guarino, ed., pp. 3-15, 1998.
[17] Y. Bishr, “Semantic Aspects of Interoperable GIS,” Wageningen Agricultural Univ. and ITC, The Netherlands, 1997.
[18] M.W. Bright, A.R. Hurson, and S. Pakzad, “Automated Resolution of Semantic Heterogeneity in Multidatabases,” ACM Trans. Database Systems, vol. 19, no. 2, pp. 212-253, June 1994.
[19] P. Fankhauser and E.J. Neuhold, “Knowledge Based Integration of Heterogeneous Databases,” Interoperable Database Systems (DS-5), D.K. Hsiao, E.J. Neuhold, and R. Sacks-Davis eds. pp. 155-175, North-Holland, 1993.
[20] C. Collet, M.N. Huhns, and W.-M. Shen, "Resource Integration Using a Large Knowledge Base in Carnot," Computer, Vol. 24, No. 12, Dec. 1991, pp.55-62.
[21] O. Resnik, “Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity and Natural Language,” J. Artificial Intelligence Research, vol. 11, pp. 95-130, 1999.
[22] S. Ross, A First Course in Probability. New York: Macmillan, 1976.
[23] E. Mena et al., "OBSERVER: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-existing Ontologies," Proc. First IFCIS Int'l Conf. Cooperative Information Systems, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 14-25.
[24] V. Kashyap and A. Sheth, “Semantic Heterogeneity in Global Information Systems: The Role of Metadata, Context, and Ontologies,” Cooperative Information Systems: Tends and Directions, M. Papazoglou and G. Schlageter, eds., pp. 139-178, 1998.
[25] B. Bergamaschi, S. Castano, S. De Capitani di Vermercati, S. Montanari, and M. Vicini, “An Intelligent Approach to Information Integration,” Proc. First Int'l Conf. Formal Ontology in Information Systems, N. Guarino, ed., pp. 253-268, 1998.
[26] A. Gangemi, D. Pisanelli, and G. Steve, “Ontology Integration: Experiences with Medical Terminologies,” Formal Ontology in Information Systems, N. Guarino, ed., pp. 163-178, 1998.
[27] W. Kim and J. Seo, "Classifying Schematic and Data Heterogeneity in Multidatabase Systems," Computer, Dec. 1991.
[28] P. Visser, D. Jones, T. Bench-Capon, and M. Shave, “Assessing Heterogeneity by Classifying Ontology Mismatches,” Formal Ontology in Information Systems, N. Guarino, ed., pp. 148-162, 1998.
[29] P. Weinstein and P. Birmingham, “Comparing Concepts in Differentiated Ontologies,” Proc. 12th Workshop Knowledge Acquisition, Modeling, and Management, 1999.
[30] E. Mena, et al., "Observer: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-Existing Ontologies," Distributed and Parallel Databases, Jan. 2000, vol. 8, no. 2, pp. 223-271.
[31] A. Rodríguez, M. Egenhofer, and R. Rugg, “Assessing Semantic Similarity Among Geospatial Entity Class Definitions,” Interoperating Geographic Information Systems INTEROP99, A. Vckovski, K. Brassel, and H.-J. Schek, eds., pp. 189-202, 1999.
[32] G. Miller, “Nouns in WordNet,” WordNet: An Electronic Lexical Database, C. Fellbaum, ed., Cambridge, Mass.: The MIT Press, pp. 23-46, 1998.
[33] C. Fellbaum, “A Semantic Network of English Verbs,” WordNet: An Electronic Lexical Database, C. Fellbaum, ed., Cambridge, Mass.: The MIT Press, pp. 69-104, 1998.
[34] J.M. Smith and D.C.P. Smith, “Database Abstractions: Aggregation and Generalization,” ACM Trans. Database Systems, vol. 2, pp. 105-133, 1977.
[35] N. Guarino, “Formal Ontology, Conceptual Analysis, and Knowledge Representation,” Int'l J. Human and Computer Studies, vol. 43, pp. 625-640, 1995.
[36] M. Winston, R. Chaffin, and D. Herramann, “A Taxonomy of Part-Whole Relations,” Cognitive Science, vol. 11, pp. 417-444, 1987.
[37] D. Cruse, “On The Transitivity of the Part-Whole Relation,” Linguistics, vol. 15, pp. 29-38, 1979.
[38] M.A. Iris, B.E. Litowitz, and M.W. Evens, "Problems of the Part-Whole Relation," Relational Models of the Lexicon: Representing Knowledge in Semantic Networks, Cambridge Univ. Press, Cambridge, UK, 1988, pp. 261-288.
[39] J. Gibson, The Ecological Approach to Visual Perception. Boston: Houghton Mifflin, 1979.
[40] S. Khoshafian and R. Abnous, Object-Orientation—Concepts, Languages, Databases, User-Interfaces. New York: Wiley, 1990.
[41] USGS, “View of the Spatial Data Transfer Standard (SDTS) Document,” available at: , 1998.
[42] A. Rodríguez and M. Egenhofer, “Putting Similarity Assessment into Context: Matching Functions with the User's Intended Operations,” Modeling and Using Context CONTEXT99, P. Bouquet, L. Sefarini, O. Brezillon, and F. Castellano, eds., pp. 310-323, 1999.
[43] L. Rips, J. Shoben, and E. Smith, “Semantic Distance and the Verification of Semantic Relations,” J. Verbal Learning and Verbal Behavior, vol. 12, pp. 1-20, 1973.
[44] C. Krumhansl, “Concerning the Applicability of Geometric Models to Similarity Data: The Interrelationship Between Similarity and Spatial Density,” Psychological Rev., vol. 85, pp. 445-463, 1978.
[45] R. Goldstone, “Similarity, Interactive Activation, and Mapping,” J. Experimental Psychology: Learning, Memory, and Cognition, vol. 20, pp. 3-28, 1994.
[46] M. Sussna, “Word Sense Disambiguation for Free-Text Indexing Using a Massive Semantic Network,” Proc. Second Int'l Conf. Information Knowledge Management (CIKM '93), 1993.
[47] D. Lin, “An Information-Theoretic Definition of Similarity,” Proc. Int'l Conf. Machine Learning (ICML '98), 1998.
[48] R. Rada,H. Mili,E. Bicknell,, and M. Blettner,“Development and application of a metric on semantic nets IEEE Trans. Systems, Man, and Cybernetics, Jan./Feb. 1989, vol. 19, no. 1, pp. 7-30.
[49] E. Rosch, “Cognitive Representations of Semantic Categories,” J. Experimental Psychology, vol. 104, pp. 192-233, 1975.
[50] E. Holman, “Monotonic Models for Asymmetric Proximities,” J. Math. Psychology, vol. 20, pp. 1-15, 1979.
[51] F. Bowdle and D. Gentner, “Informativity and Asymmetry in Comparisons,” Cognitive Psychology, vol. 34, pp. 244-286, 1997.
[52] G. Birkhoff, Lattice Theory. Providence, R.I.: Am. Math. Soc., 1967.
[53] S.N. Khoshafian and G.P. Copeland, “Object Identity,” Proc. Int'l Conf. Object-Oriented Programming Systems Languages and Applications (OOPSLA '86), Sept. 1986.
[54] S.B. Zdonik and D. Maier, “Fundamentals of Object-Oriented Databases,” Readings in Object-Oriented Database Systems, S.B. Zdonik and D. Maier, eds., San Mateo, Cailf.: Morgan Kaufmann, pp. 1-32, 1990.
[55] G. Miller and W. Charles, “Contextual Correlates of Semantic Similarity,” Language and Cognitive Processes, vol. 6, pp. 1-28, 1991.
[56] C. Leacock and M. Chodorow, “Combining Local Context and WordNet Similarity for Word Sense Identification,” WordNet: An Electronic Lexical Database, C. Fellbaum, ed., Cambridge, Mass.: The MIT Press, pp. 265-283, 1998.
[57] J. Burg and R. van de Riet, “COLOR-X: Using Knowledge from WordNet for Conceptual Modeling,” WordNet: An Electronic Lexical Database, C. Fellbaum, ed., Cambridge, Mass.: The MIT Press, 1998.
[58] R. Richardson and A. Smeaton, “Using WordNet in a Knowledge-Based Approach to Information Retrieval,” Technical Report CA-0395, Dublin City Univ., School of Computer Applications, Dublin, Ireland, 1995.
[59] R.R. Korfhage, Information Storage and Retrieval, New York: John Wiley&Sons, 1997.
[60] W. Daniel, Applied Nonparametric Statistics. Boston: Houghton Mifflin, 1978.
[61] J. Gibbons, Nonparametric Method for Quantitative Analysis. Columbus, Ohio: Am. Sciences Press, 1976.

Index Terms:
Semantic similarity, ontology integration, information integration, semantic interoperability, semantic matching.
M. Andrea Rodríguez, Max J. Egenhofer, "Determining Semantic Similarity among Entity Classes from Different Ontologies," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 442-456, March-April 2003, doi:10.1109/TKDE.2003.1185844
Usage of this product signifies your acceptance of the Terms of Use.