The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2008 vol.23)
pp: 26-33
Peter Mika , Yahoo! Research
Massimiliano Ciaramita , Yahoo! Research
Hugo Zaragoza , Yahoo! Research
Jordi Atserias , Yahoo! Research
ABSTRACT
Natural language technologies have long been envisioned to play a crucial role in developing a Semantic Web. Textual content's significance on the Web has increased with the rise of Web 2.0 and mass participation in content generation. Yet, natural language technologies face great challenges in dealing with Web content's heterogeneity: key among these is domain and task adaptation. To address this challenge, the authors consider the problem of semantically annotating Wikipedia. Specifically, they investigate a method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available. By creating a semantic mapping among vocabularies from two sources: Wikipedia and the original annotated corpus, they improve their tagger on Wikipedia. Moreover, by applying their tagger and mapping between sources, they significantly extend the metadata currently available in the DBpedia collection. This article is part of a special issue on Natural Language Processing and the Web.
INDEX TERMS
natural language processing, named entity recognition, Wikipedia
CITATION
Peter Mika, Massimiliano Ciaramita, Hugo Zaragoza, Jordi Atserias, "Learning to Tag and Tagging to Learn: A Case Study on Wikipedia", IEEE Intelligent Systems, vol.23, no. 5, pp. 26-33, September/October 2008, doi:10.1109/MIS.2008.85
REFERENCES
1. M. Ciaramita and Y. Altun, "Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger," Proc. Conf. Empirical Methods Natural Language Processing (EMNLP06), Assoc. for Computational Linguistics, 2006, pp. 594–602.
2. M. Collins, "Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms," Proc. Conf. Empirical Methods Natural Language Processing (EMNLP02), Assoc. for Computational Linguistics, 2002, pp. 1–8.
3. M.P. Marcus, B. Santorini, and M.A. Marcinkiewicz, "Building a Large Annotated Corpus of English: The Penn Treebank," Computational Linguistics, vol. 19, no. 2, 1993, pp. 313–330.
4. E.F. Tjong Kim Sang and F. De Muelder, "Introduction to the CoNLL2003 Shared Task: Language-Independent Named Entity Recognition," Proc. 7th Conf. Natural Language Learning (CoNLL 03), 2003, pp. 142–147.
5. F. Wu and D.S. Weld, "Autonomously Semantifying Wikipedia," Proc. 16th ACM Conf. Information and Knowledge Management (CIKM 07), ACM Press, 2007, pp. 41–50.
6. P. Mika, "Ontologies Are Us: A Unified Model of Social Networks and Semantics," J. Web Semantics, vol. 5, no. 1, 2007, pp. 5–15.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool