Mining the Web to Create Specialized Glossaries
September/October 2008 (vol. 23 no. 5)
pp. 18-25
Paola Velardi, University of Rome "La Sapienza."
Roberto Navigli, University of Rome "La Sapienza."
Pierluigi D'Amadio, IT consultant
Glossaries are helpful for integrating information, reducing semantic heterogeneity, and facilitating communication between information systems. Commercial publishers charge lexicographers with building glossaries, but this isn't appropriate when a domain's semantics are continuously evolving rather than precisely characterized, as in emerging Web communities and interest groups. In emerging domains, glossary building is the cooperative effort of a team of domain experts. It involves several steps, including identifying the domain-relevant terminology, defining each term, and harmonizing the results. This is a time-consuming, costly process that often requires support from a collaborative platform to facilitate shared decisions and validation. TermExtractor and GlossExtractor, two Web applications based on Web mining techniques, support this complete glossary-building procedure. The tools exploit the Web's evolving nature, allowing one to continually update the emerging community's vocabulary. TermExtractor and GlossExtractor, which were used in the European project Interop, are freely available and are being used in experiments in different domains across the world. This article is part of a special issue on Natural Language Processing and the Web.

Web text analysis, natural language processing, artificial intelligence, knowledge acquisition, knowledge management
Paola Velardi, Roberto Navigli, Pierluigi D'Amadio, "Mining the Web to Create Specialized Glossaries," IEEE Intelligent Systems, vol. 23, no. 5, pp. 18-25, Sept.-Oct. 2008, doi:10.1109/MIS.2008.88
