loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)
Automatic Training Corpora Acquisition through Web Mining
Compi?gne University of Technology, France
September 19-September 22
ISBN: 0-7695-2415-X
Chien-Chung Huang, Dartmouth College
Kuan-Ming Lin, Duke University
Lee-Feng Chien, Academia Sinica

Text classification is a task having been extensively studied for decades. However, most previous work pre-assumes the existence of explicitly-labeled corpora. In this study, we focus on the issue of automatic corpora acquisition. We propose an Web-based mining approach to collect necessary corpora, which can be greatly useful to both common users and system designers. Moreover, the proposed technique can also be incorporated with existing classification techniques to further boost classifier performance.

It has been shown that the concept of the class can be captured by the class name and its associated terms [10]. In this work, we aim at analyzing Web-retrieved documents to discover the associated terms, which are further utilized to collect more training corpora. Working iteratively, the proposed approach can acquire training corpora of high quality. We give empirical evidence that the classifiers thus created have promising accuracy. In sum, the convenience and efficiency of the proposed approach, along with the new perspective on the issue of corpora acquisition, are the primary contributions of this work.

Citation:
Chien-Chung Huang, Kuan-Ming Lin, Lee-Feng Chien, "Automatic Training Corpora Acquisition through Web Mining," wi, pp.193-199, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.