This Article 
 Bibliographic References 
 Add to: 
Merging Thesauri: Principles and Evaluation
March 1988 (vol. 10 no. 2)
pp. 204-220

An investigation is reported of ways to take advantage of the semantics in thesauri to improve: (1) indexing by providing descriptions of documents as sets of terms from the thesaurus; and (2) retrieval by assessing the relevance of documents to a query. Thesauri need to be updated to account for the evolution of the field they cover. Accordingly, various augmentation algorithms and methods to assess the usefulness of these augmentations are being studied. The augmentations consist of merging two existing thesauri. By keeping a consistent level of complexity among the structure manipulated by the merging algorithm, the reasoning method, and the evaluation procedure, an improvement of the performance of the merged thesaurus on both document indexing and retrieval is demonstrated.

[1] R. Brachman, "What IS-A is and isn't: An analysis of taxonomic links in semantic networks,"IEEE Computer., vol. 16, no. 10, pp. 30-36, 1983.
[2] B. Buchanan and L. M. Fu, "Learning intermediate concepts in constructing a hierarchical knowledge base," inProc. Ninth Int. Joint Conf. Artif. Intell., Los Angeles, CA, 1985, pp. 659-666.
[3] J. Carbonell and R. Joseph,FRAMEKIT+: A Knowledge Representation System, Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, Oct. 1985.
[4] J. C. Cardin, "On the relation between question-answering systems and various theorectical approaches to the analysis of text," inThe Analysis of Meaning, K. Gray, Ed. London, UK: 1979, pp. 206- 220.
[5] A. M. Collins and E. F. Loftus, "A spreading activation theory of semantic processing,"Psychol. Rev., vol. 82, pp. 407-428, 1975.
[6] R. Cote,Systematized Nomenclature of Medicine, College of American Pathologists, Skokie, IL, 1979.
[7] National Library and Information Associations Council,Guidelines for Thesaurus Structure, Construction, and Use. New York: American National Standards Institute, 1980.
[8] I. Dahlberg, "Conceptual compatibility of ordering systems,"Int. Classif., vol. 10, no. 1, pp. 5-8, 1983.
[9] R. Davis, "TEIRESIAS: Applications of meta-level knowledge," inKnowledge-Based Systems in Artificial Intelligence, D. Lenat, Ed. New York: McGraw-Hill, pp. 229-491.
[10] G. Dunham, M. Pacak, and A. Pratt, "Automatic indexing of pathology data,"J. Amer. Soc. Inform. Sci., vol. 29, pp. 81-90, Mar. 1978.
[11] A. Edwards,Statistical Methods for the Behavioral Sciences. New York: Holt, Rinehart and Winston, 1964.
[12] S. Epstein, "Transportable natural language processing through simplicity-The PRE system,"ACM Trans. Office Inform. Syst., vol. 3, no. 2, pp. 107-120.
[13] R. Fidel, "Online searching styles: Case-study-based model of searching behavior,"J. Amer. Soc. Inform. Sci., vol. 35, no. 4, pp. 211-221, 1984.
[14] R. Fikes and T. Kehler, "The role of frame-based representation in reasoning,"Commun. ACM, vol. 28, pp. 904-920, Sept. 1985.
[15] R. E. Fikes and G. Hendrix, "A network-based knowledge representation and its natural deduction system," inProc. Fifth Int. Joint Conf. Artif. Intell., Cambridge, MA, 1977, pp. 235-246.
[16] R. Forsyth and R. Rada,Machine Learning: Expert Systems and Information Retrieval. London, UK: Ellis Horwood, 1986.
[17] E. Fox,"Extending the Boolean and vector space models of information retrieval withP-norm queries and multiple concept types," Ph.D. dissertation, Dep. Comput. Sci., Cornell Univ., Ithaca, NY, 1983.
[18] J.P. Fry and E.H. Sibley, "Evolution of Database Management Systems,"ACM Computing Surveys, March, 1976, pp. 7-42.
[19] J. Hendler, "Integrating marker-passing and problem-solving: A spreading activation approach to improved choice in planning," Dep. Comput. Sci., Univ. Maryland, College Park, MD, Tech. Rep. TR- 1624, 1986; also published as doctoral dissertation, Brown Univ., Providence, RI.
[20] S. Humphrey and B. Melloni,Databases: A Primer for Retrieving Information by Computer. Englewood Cliffs, NJ: Prentice-Hall, 1986.
[21] M. G. Kendall,Rank Correlation Methods. New York: Hafner, 1970.
[22] M. Kirtland, "Macro and microthesauri: Changes occurring in MeSH-derived thesauri and a solution to some related search handicaps,"J. Amer. Soc. Inform. Sci., vol. 32, pp. 249-252, July 1981.
[23] R. R. Korfhage, "Intelligent information retrieval: Issues in user modeling," inProc. Expert Syst. Government Conf., Washington, DC, IEEE Computer Society Press, 1985, pp. 474-482.
[24] C. L. Liu,Elements of Discrete Mathematics. New York: McGraw-Hill, 1977.
[25] National Library of Medicine, "MEDLARS, the computerized literature retrieval service," U.S. Government Printing Office, Bethesda, MD, DHEW NIH 79-1286, Jan. 1979.
[26] MEDLARS Indexing Manual, National Library of Medicine, Bethesda, MD, NLM Publ. PB84-104829, 1985.
[27] H. Mili and R. Rada, "A statistically built knowledge base," inProc. Expert Syst. Government Conf., Washington, DC, IEEE Computer Society Press, Oct. 1985, pp. 457-463.
[28] S. Miyamoto and K. Nakayama, "Fuzzy information retrieval based on a fuzzy pseudothesaurus,"IEEE Trans. Syst. Man, Cybernet., vol. SMC-12, no. 2, pp. 278-282, Mar. 1986.
[29] J. Mylopoulos and H. Levesque, "An overview of knowledge representation," inOn Conceptual Modeling: Perspectives from Artificial Intelligence, Databases, and Programming Languages, J. Schmidt, Ed. New York: Springer-Verlag, pp. 3-18, 1984.
[30] K. Nakamura and S. Iwai, "Topological fuzzy sets as a quantitative description of analogical inference and its application to question answering systems for information retrieval,"IEEE Trans. Syst., Man, Cybernet., vol. SMC-8, pp. 193-203, Mar./Apr. 1982.
[31] National Library of Medicine, Medical Subject Headings Section,Medical Subject Headings, Tree Structures. Springfield, VA: National Technical Information Service, 1986.
[32] National Library of Medicine,Medical Subject Headings, Annotated Alphabetical List. Springfield, VA: National Technical Information Service, 1986.
[33] M. R. Quillian, "Semantic memory," inSemantic Information Processing, M. Minsky, Ed. Cambridge, MA: M.I.T. Press, 1968.
[34] R. Rada and L. Evans, "Automated problem encoding system for ambulatory care,"Comput. Biomed. Res., vol. 12, pp. 131-139, 1979.
[35] R. Rada, E. Brown, S. Humphrey, A. Suh, and C. Coccia, "Relevance on a biomedical classification structure," inProc. Expert Syst. Government Conf., Washington, DC, IEEE Computer Society Press, Oct. 1985, pp. 532-537.
[36] R. Rada, S. Humphrey, and C. Coccia, "A knowledge-base for retrieval evaluation,"Annu. Proc. ACM, pp. 360-367, Oct. 1985.
[37] R. Rada, F. Lu, J. Eng, and B. Brylawski, "Augmentation and evaluation of a medical classification structure through morphosemantic analysis," inProc. MEDINFO-86, Washington, DC, 1986, pp. 1096- 1100.
[38] R. Rada, E. Calhoun, H. Mili, S. Singer, B. Blum, and H. Orthner, "A medical informatics thesaurus," inProc. MEDINFO '86, Oct. 1986, pp. 1164-1172.
[39] R. Rada, L. Darden, and J. Eng, "Relating two knowledge bases: The role of identity and part-whole," inThe Role of Language in Problem Solving, vol. 2, R. Jernigan, Ed. Amsterdam, the Netherlands: Elsevier Science, 1987, pp. 71-91.
[40] E. Rich, "User modeling via stereotypes,"Cognitive Sci., vol. 3, pp. 329-354, 1979.
[41] E. Rich, "Users are individuals: Individualizing user models,"Int. J. Man-Mach. Studies, vol. 18, pp. 199-214, 1983.
[42] C. J. van Rijsbergen, "A new theoretical framework for information retrieval," inProc. Ninth Annu. ACM Special Interest Group Inform. Retrieval Conf., Milan, Italy, Sept. 1986, pp. 194-200.
[43] G. Salton and M. J. McGill,Introduction to Modern Information Retrieval(Computer Series). New York: McGraw-Hill, 1983.
[44] J. Sammet and A. Ralston, "The new (1982) computing reviews classification system--Final version,"Commun. ACM, vol. 25, no. 1, pp. 13-25, Jan. 1982.
[45] R. Schank, J. Kolodner, and G. DeJong, "Conceptual information retrieval," inInformation Retrieval Research. P. W. Williams, Ed. London, UK: Butterworths, 1981, pp. 94-116.
[46] R. Schank,Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge UK: Cambridge Univ. Press, 1982.
[47] W. Sewell and S. Teitelbaum, "Observations of end user online searching behavior over eleven years,"J. Amer. Soc. Inform. Sci., vol. 37, pp. 234-245, 1986.
[48] B. Shneiderman,Designing the User Interface: Strategies for Effective Human-Computer Interaction. Reading, MA: Addison-Wesley, 1987.
[49] S. Siegel,Nonparametric Statistics. New York: McGraw-Hill, 1956.
[50] E. E. Smith, E. J. Shoben, and L. J. Rips, "Structure and process in semantic memory: A featural model of semantic decisions,"Psychol. Rev., vol. 81, pp. 214-241, 1974.
[51] D. Soergel,Indexing Languages and Thesauri: Construction and Maintenance. New York: Wiley, 1974.
[52] D. Soergel, "Automatic and semi-automatic methods as an aid in the construction of indexing languages and thesauri,"Int. Classif., vol. 1, no. 1, pp. 34-39, 1974.
[53] N. S. Sridharan, "Evolving systems of knowledge,"AI Mag., pp. 108-119, Fall 1985.
[54] E. Svenonius, "Unanswered questions in the design of controlled vocabularies,"J. Amer. Soc. Inform. Sci., vol. 37, no. 5, pp. 331-340, 1986.
[55] R. M. Tong, L. A. Appelbaum, and C. J. Van Rijsbergen, "Conceptual information retrieval using RUBRIC," inProc. Tenth Annu. Int. ACM-SIGIR Conf. Research and Development in Information Retrieval, New Orleans, LA, June 1987, C. T. Yu, Ed. New York: ACM Press, pp. 247-253.
[56] W. Walker and W. Kintsch, "Automatic and strategic aspects of knowledge retrieval,"Cognitive Sci., vol. 9, pp. 261-283, 1985.

Index Terms:
information retrieval; semantics; augmentation algorithms; reasoning method; merged thesaurus; document indexing; artificial intelligence; information retrieval; thesauri
H. Mili, R. Rada, "Merging Thesauri: Principles and Evaluation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 2, pp. 204-220, March 1988, doi:10.1109/34.3883
Usage of this product signifies your acceptance of the Terms of Use.