loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2
A New Framework for Uncertainty Sampling: Exploiting Uncertain and Positive-Certain Examples in Similarity-Based Text Classification
Las Vegas, Nevada
April 05-April 07
ISBN: 0-7695-2108-8
Kang H. Lee, University of Sydney, NSW, Australia
Byeong H. Kang, University of Tasmania, Hobart, Tasmania, Australia
One of the major concerns with supervised learning approaches to text classification is that they require a large number of labeled examples to achieve a high level of effectiveness. Labeling such a large number of examples poses a considerable burden on human experts. Two common approaches to reduce the amount of labeled examples required are: (1) selecting informative uncertain examples for human-labeling and (2) using many inexpensive unlabeled data with a small number of labeled examples. While previous work in text classification focused only on one approach, we investigate a framework to combine both approaches in similarity-based text classification. By applying our new thresholding strategy (RinSCut) to uncertainty sampling, we propose a new framework which automatically selects informative uncertain data that should be presented to human expert for labeling and positive-certain data that are directly used for learning without human-labeling. With our similarity-based learning algorithm (KAN), experiments have been conducted on Reuters-21578 data set. Our proposed scheme has been compared with random sampling and previous conventional uncertainty sampling, based on micro and macro-averaged F1. The results showed that if both macro and micro-averaged measures are concerned, the optimal choice might be our framework.
Citation:
Kang H. Lee, Byeong H. Kang, "A New Framework for Uncertainty Sampling: Exploiting Uncertain and Positive-Certain Examples in Similarity-Based Text Classification," itcc, vol. 2, pp.474, International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2, 2004
Usage of this product signifies your acceptance of the Terms of Use.