The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2008 vol.23)
pp: 18-25
Paola Velardi , University of Rome "La Sapienza."
Roberto Navigli , University of Rome "La Sapienza."
Pierluigi D'Amadio , IT consultant
ABSTRACT
Glossaries are helpful for integrating information, reducing semantic heterogeneity, and facilitating communication between information systems. Commercial publishers charge lexicographers with building glossaries, but this isn't appropriate when a domain's semantics are continuously evolving rather than precisely characterized, as in emerging Web communities and interest groups. In emerging domains, glossary building is the cooperative effort of a team of domain experts. It involves several steps, including identifying the domain-relevant terminology, defining each term, and harmonizing the results. This is a time-consuming, costly process that often requires support from a collaborative platform to facilitate shared decisions and validation. TermExtractor and GlossExtractor, two Web applications based on Web mining techniques, support this complete glossary-building procedure. The tools exploit the Web's evolving nature, allowing one to continually update the emerging community's vocabulary. TermExtractor and GlossExtractor, which were used in the European project Interop, are freely available and are being used in experiments in different domains across the world. This article is part of a special issue on Natural Language Processing and the Web.
INDEX TERMS
Web text analysis, natural language processing, artificial intelligence, knowledge acquisition, knowledge management
CITATION
Paola Velardi, Roberto Navigli, Pierluigi D'Amadio, "Mining the Web to Create Specialized Glossaries", IEEE Intelligent Systems, vol.23, no. 5, pp. 18-25, September/October 2008, doi:10.1109/MIS.2008.88
REFERENCES
1. E.P. Bontas and M. Mochol, "Towards a Cost Estimation Model for Ontologies," Proc. 3rd Berliner XML Tage, Humboldt-Universität zu Berlin and Freie Univ. Berlin, 2005, pp. 153–160.
2. F. Sclano and P. Velardi, "TermExtractor: A Web Application to Learn the Common Terminology of Interest Groups and Research Communities," Proc. 9th Conf. Terminology and Artificial Intelligence, 2007.
3. P. Velardi, R. Navigli, and M. Pétit, "Semantic Indexing of a Competence Map to Support Scientific Collaboration in a Research Community," Proc. 20th Int'l Joint Conf. Artificial Intelligence, AAAI Press, 2007, pp. 2897–2902.
4. M.A. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proc. 14th Int'l Conf. Computational Linguistics, Assoc. for Computational Linguistics, 1992, pp. 539–545.
5. R. Navigli and P. Velardi, "Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites," Computational Linguistics, vol. 30, no. 2, 2004, pp. 151–179.
6. J. Klavans and S. Muresan, "Evaluation of the DEFINDERSystem for Fully Automatic Glossary Construction," Proc. Am. Medical Informatics Assoc. Symp., 2001, pp. 324–328.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool