This Article 
 Bibliographic References 
 Add to: 
Mining the Web to Create Specialized Glossaries
September/October 2008 (vol. 23 no. 5)
pp. 18-25
Paola Velardi, University of Rome "La Sapienza."
Roberto Navigli, University of Rome "La Sapienza."
Pierluigi D'Amadio, IT consultant
Glossaries are helpful for integrating information, reducing semantic heterogeneity, and facilitating communication between information systems. Commercial publishers charge lexicographers with building glossaries, but this isn't appropriate when a domain's semantics are continuously evolving rather than precisely characterized, as in emerging Web communities and interest groups. In emerging domains, glossary building is the cooperative effort of a team of domain experts. It involves several steps, including identifying the domain-relevant terminology, defining each term, and harmonizing the results. This is a time-consuming, costly process that often requires support from a collaborative platform to facilitate shared decisions and validation. TermExtractor and GlossExtractor, two Web applications based on Web mining techniques, support this complete glossary-building procedure. The tools exploit the Web's evolving nature, allowing one to continually update the emerging community's vocabulary. TermExtractor and GlossExtractor, which were used in the European project Interop, are freely available and are being used in experiments in different domains across the world. This article is part of a special issue on Natural Language Processing and the Web.

1. E.P. Bontas and M. Mochol, "Towards a Cost Estimation Model for Ontologies," Proc. 3rd Berliner XML Tage, Humboldt-Universität zu Berlin and Freie Univ. Berlin, 2005, pp. 153–160.
2. F. Sclano and P. Velardi, "TermExtractor: A Web Application to Learn the Common Terminology of Interest Groups and Research Communities," Proc. 9th Conf. Terminology and Artificial Intelligence, 2007.
3. P. Velardi, R. Navigli, and M. Pétit, "Semantic Indexing of a Competence Map to Support Scientific Collaboration in a Research Community," Proc. 20th Int'l Joint Conf. Artificial Intelligence, AAAI Press, 2007, pp. 2897–2902.
4. M.A. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proc. 14th Int'l Conf. Computational Linguistics, Assoc. for Computational Linguistics, 1992, pp. 539–545.
5. R. Navigli and P. Velardi, "Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites," Computational Linguistics, vol. 30, no. 2, 2004, pp. 151–179.
6. J. Klavans and S. Muresan, "Evaluation of the DEFINDERSystem for Fully Automatic Glossary Construction," Proc. Am. Medical Informatics Assoc. Symp., 2001, pp. 324–328.
1. A. Fujii and T. Ishikawa, "Utilizing the World Wide Web as an Encyclopedia: Extracting Term Descriptions from Semi-Structured Texts," Proc. 38th Ann. Meeting Assoc. for Computational Linguistics, Morgan Kaufmann, 2000, pp. 488–495.
2. Y. Park, R. Byrd, and B. Boguraev, "Automatic Glossary Extraction: Beyond Terminology Identification," Proc. 19th Int'l Conf. Computational Linguistics, Howard Int'l House and Academica Sinica, 2002, pp. 1–7.
3. J. Klavans and S. Muresan, "Evaluation of the DEFINDERSystem for Fully Automatic Glossary Construction," Proc. American Medical Informatics Association Symp., 2001, pp. 324–328.
4. S. Miliaraki and I. Androutsopoulos, "Learning to Identify Single-Snippet Answers to Definition Questions," Proc. 20th Int'l Conf. Computational Linguistics, Morgan Kaufmann, 2004, pp. 1360–1366.
5. H.T. Ng, J.L.P. Kwan, and Y. Xia, "Question Answering Using a Large Text Database: A Machine Learning Approach," Proc. Conf. Empirical Methods in Natural Language Processing, Assoc. for Computational Linguistics, 2001, pp. 67–73.
6. I. Androutsopoulos and D. Galanis, "A Practically Unsupervised Learning Method to Identify Single-Snippet Answers to Definition Questions on the Web," Proc. Human Language Technology Conf. and Conf. Empirical Methods in Natural Language Processing, Assoc. for Computational Linguistics, 2005, pp. 323–330.
7. H. Saggion, "Identifying Definitions in Text Collections for Question Answering," Proc. Language Resources and Evaluation Conf., European Language Resources Assoc., 2004.
8. H. Cui, M.K. Kan, and T.S. Chua, "Soft Pattern Matching Models for Definitional Question Answering," ACM Trans. Information Systems, vol. 25, no. 2, 2007, pp. 1–30.

Index Terms:
Web text analysis, natural language processing, artificial intelligence, knowledge acquisition, knowledge management
Paola Velardi, Roberto Navigli, Pierluigi D'Amadio, "Mining the Web to Create Specialized Glossaries," IEEE Intelligent Systems, vol. 23, no. 5, pp. 18-25, Sept.-Oct. 2008, doi:10.1109/MIS.2008.88
Usage of this product signifies your acceptance of the Terms of Use.