loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)
An Editor Labeling Model for Training Set Expansion in Web Categorization
Compi?gne University of Technology, France
September 19-September 22
ISBN: 0-7695-2415-X
Tie-Yan Liu, Microsoft Research Asia
Hao Wan, Tsinghua University
Wei-Ying Ma, Microsoft Research Asia
Automatically classifying web pages is an effective way to manage the massive information on the Web. However, our experiments show that the state-of-the-art text categorization technologies can not achieve a satisfactory classification performance in this task. The major reason is the existence of large proportion of rare categories in Web taxonomies. The failure in such categories is simply because there is not enough information to train reliable classifiers. To tackle this problem, we propose to expand the training set of the rare categories, by simulating the labeling behavior of the human editors of Web directories. Experimental results show that in such a way, we achieved significant (relatively 93%) improvement in classification accuracy, which is highly encouraging for high-performance Web classification.
Citation:
Tie-Yan Liu, Hao Wan, Wei-Ying Ma, "An Editor Labeling Model for Training Set Expansion in Web Categorization," wi, pp.165-171, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.