The Community for Technology Leaders
Green Image
Issue No. 05 - September/October (2008 vol. 23)
ISSN: 1541-1672
pp: 26-33
Peter Mika , Yahoo! Research
Hugo Zaragoza , Yahoo! Research
Massimiliano Ciaramita , Yahoo! Research
Jordi Atserias , Yahoo! Research
ABSTRACT
Natural language technologies have long been envisioned to play a crucial role in developing a Semantic Web. Textual content's significance on the Web has increased with the rise of Web 2.0 and mass participation in content generation. Yet, natural language technologies face great challenges in dealing with Web content's heterogeneity: key among these is domain and task adaptation. To address this challenge, the authors consider the problem of semantically annotating Wikipedia. Specifically, they investigate a method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available. By creating a semantic mapping among vocabularies from two sources: Wikipedia and the original annotated corpus, they improve their tagger on Wikipedia. Moreover, by applying their tagger and mapping between sources, they significantly extend the metadata currently available in the DBpedia collection. This article is part of a special issue on Natural Language Processing and the Web.
INDEX TERMS
natural language processing, named entity recognition, Wikipedia
CITATION
Peter Mika, Hugo Zaragoza, Massimiliano Ciaramita, Jordi Atserias, "Learning to Tag and Tagging to Learn: A Case Study on Wikipedia", IEEE Intelligent Systems, vol. 23, no. , pp. 26-33, September/October 2008, doi:10.1109/MIS.2008.85
90 ms
(Ver )