loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Third IEEE International Conference on Data Mining (ICDM'03)
Mining Relevant Text from Unlabelled Documents
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
Daniel Barbar?, George Mason University, Fairfax, VA
Carlotta Domeniconi, George Mason University, Fairfax, VA
Ning Kang, George Mason University, Fairfax, VA
Automatic classification of documents is an important area of research with many applications in the fields of document searching, forensics and others. Methods to perform classification of text rely on the existence of a sample of documents whose class labels are known. However, in many situations, obtaining this sample may not be an easy (or even possible) task. In this paper we focus on the classification of unlabelled documents into two classes: relevant and irrelevant, given a topic of interest. By dividing the set of documents into buckets (for instance, answers returned by different search engines), and using association rule mining to find common sets of words among the buckets, we can efficiently obtain a sample of documents that has a large percentage of relevant ones. This sample can be used to train models to classify the entire set of documents. We prove, via experimentation, that our method is capable of filtering relevant documents even in adverse conditions where the percentage of irrelevant documents in the buckets is relatively high.
Citation:
Daniel Barbar?, Carlotta Domeniconi, Ning Kang, "Mining Relevant Text from Unlabelled Documents," icdm, pp.489, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.