This Article 
 Bibliographic References 
 Add to: 
Building Hypertext Links By Computing Semantic Similarity
September/October 1999 (vol. 11 no. 5)
pp. 713-730

Abstract—Most current automatic hypertext generation systems rely on term repetition to calculate the relatedness of two documents. There are well-recognized problems with such approaches, most notably, a vulnerability to the effects of synonymy (many words for the same concept) and polysemy (many concepts for the same word). We propose a novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sequences of related words in a text. This method uses a more general notion of document relatedness, and attempts to take into account the effects of synonymy and polysemy. We also present the results of an empirical study designed to test this method in the context of a question answering task from a database of newspaper articles.

[1] J. Westland, “Economic Constraints in Hypertext,” J. Am. Soc. Information Science, vol. 42, no. 3, pp. 178–184, 1991.
[2] S. Green, “Automatically Generating Hypertext by Computing Semantic Similarity,” PhD thesis, Univ. of Toronto, 1997.
[3] D. Ellis, J. Furner-Hines, and P. Willett, “On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency,” J. Documentation, vol. 50, no. 2, pp. 67–98 1994.
[4] R. Rada and D. Diaper, “Converting Text to Hypertext and Vice Versa,” Hypermedia/Hypertext and Object-Oriented Databases, H. Brown, ed., ch. 9, pp. 167–200, Chapman and Hall, 1991.
[5] J. Morris and G. Hirst, “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text,” Computational Linguistics, vol. 17, no. 1, pp. 21–48, 1991.
[6] P. Thistlewaite, “Automatic Construction and Management of Large Open Webs,” Information Processing and Management, vol. 33, no. 2, pp. 161–173, 1997.
[7] J. Allan, “Building Hypertext Using Information Retrieval,” Information Processing and Management, vol. 33, no. 2, pp. 145–159, 1997.
[8] E. Voorhees, “Query Expansion Using Lexical-Semantic Relations,” Proc. ACM SIGIR Int'l Conf. Research and Development in Information Retrieval, pp. 61-69, 1994.
[9] M. Halliday and R. Hasan, Cohesion in English. Longman, 1976.
[10] Roget's Int'l Thesaurus, fifth ed., R. Chapman, ed., Harper Collins, 1992.
[11] D. St-Onge, “Detecting and Correcting Malapropisms with Lexical Chains,” MS thesis, Univ. of Toronto, published as Technical Report CSRI-319, 1995.
[12] R. Beckwith, C. Fellbaum, D. Gross, and G. Miller, “WordNet: A Lexical Database Organized on Psycholinguistic Principles,” Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, U. Zernik, ed., pp. 211–231, Lawrence Erlbaum, 1991.
[13] D. Tudhope and C. Taylor, “Navigation via Similarity: Automatic Linking Based on Semantic Closeness,” Information Processing and Management, vol. 33, no. 2, pp. 233–242, 1997.
[14] S. Shellenbarger, “High-Tech Parenting Virtually a Finger Tip Away,” The Globe and Mail, p. A10, Dec. 1995.
[15] D. Ellis, J. Furner-Hines, and P. Willett, “The Creation of Hypertext Linkages in Full-Text Documents: Parts I and II,” Technical Report RDD/G/142, British Library Research and Development Dept., Apr. 1994.
[16] J. Gadd, “Children's Aid Societies Plan Staff, Services Cuts,” The Globe and Mail, p. A10, Sept. 1995.
[17] J. Gadd, “Child Aid‘On Double-Edged Sword’,” The Globe and Mail, p. A14, Dec. 1995.
[18] G. Salton and J. Allan, “Selective Text Utilization and Text Traversal,” Proc. Hypertext '93, pp. 131–144, ACM, Nov. 1993.
[19] A. Forsyth, “A Dictionary/Thesaurus for a Document Retrieval System,” MS thesis, Univ. of Toronto, 1986.
[20] S. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, “Indexing by Latent Semantic Analysis,” J. Soc. Information Science, vol. 41, no. 6, pp. 391–407, 1990.
[21] G. Marchionini, S. Dwiggins, A. Katz, and X. Lin, “Information Seeking in Full-Text End-User-Oriented Search Systems: The Roles of Domain and Search Expertise,” Library and Information Science Research, vol. 15, no. 1, pp. 35–69, 1993.
[22] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.
[23] B. Nordhausen, M. Chignell, and J. Waterworth, “The Missing Link? Comparison of Manual and Automated Linking in Hypertext Eng.,” Proc. Human Factors Soc. 35th Ann. Meeting, 1991.
[24] G. Golovchinsky, "What the Query Told the Link: The Integration of Hypertext and Information Retrieval," Proc. 8th ACM Conf. Hypertext, ACM Press, New York, 1997, pp. 67-74.

Index Terms:
Automatic hypertext generation, information retrieval, semantic relatedness, lexical semantics, lexical chaining.
Stephen J. Green, "Building Hypertext Links By Computing Semantic Similarity," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 5, pp. 713-730, Sept.-Oct. 1999, doi:10.1109/69.806932
Usage of this product signifies your acceptance of the Terms of Use.