loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06)
Identifying Document Topics Using the Wikipedia Category Network
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2747-7
Peter Schonhofen, Hungarian Academy of Sciences, Hungary
In the last few years the size and coverage of Wikipe- dia, a freely available on-line encyclopedia has reached the point where it can be utilized similar to an ontology or tax- onomy to identify the topics discussed in a document. In this paper we will show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories sur- prisingly well. We test the reliability of our method by pre- dicting categories ofWikipedia articles themselves based on their bodies, and by performing classification and cluster- ing on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts.
Citation:
Peter Schonhofen, "Identifying Document Topics Using the Wikipedia Category Network," wi, pp.456-462, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.