loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Third IEEE International Conference on Data Mining (ICDM'03)
Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
Kang Peng, Temple University, Philadelphia, PA
Slobodan Vucetic, Temple University, Philadelphia, PA
Bo Han, Temple University, Philadelphia, PA
Hongbo Xie, Temple University, Philadelphia, PA
Zoran Obradovic, Temple University, Philadelphia, PA
Predictive data mining typically relies on labeled data without exploiting a much larger amount of available unlabeled data. The goal of this paper is to show that using unlabeled data can be beneficial in a range of important prediction problems and therefore should be an integral part of the learning process. Given an unlabeled dataset representative of the underlying distribution and a K-class labeled sample that might be biased, our approach is to learn K contrast classifiers each trained to discriminate a certain class of labeled data from the unlabeled population. We illustrate that contrast classifiers can be useful in one-class classification, outlier detection, density estimation, and learning from biased data. The advantages of the proposed approach are demonstrated by an extensive evaluation on synthetic data followed by real-life bioinformatics applications for (1) ranking PubMed articles by their relevance to protein disorder and (2) cost-effective enlargement of a disordered protein database.
Citation:
Kang Peng, Slobodan Vucetic, Bo Han, Hongbo Xie, Zoran Obradovic, "Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining," icdm, pp.267, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.