loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
Boosting the Feature Space: Text Classification for Unstructured Data on the Web
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Yang Song, The Pennsylvania State University, USA
Ding Zhou, The Pennsylvania State University, USA
Jian Huang, The Pennsylvania State University, USA
Isaac G. Councill, The Pennsylvania State University, USA
Hongyuan Zha, The Pennsylvania State University, USA
C. Lee Giles, The Pennsylvania State University, USA
The issue of seeking efficient and effective methods for classifying unstructured text in large document corpora has received much attention in recent years. Traditional document representation like bag-of-words encodes documents as feature vectors, which usually leads to sparse feature spaces with large dimensionality, thus making it hard to achieve high classification accuracies. This paper addresses the problem of classifying unstructured documents on the Web. A classification approach is proposed that utilizes traditional feature reduction techniques along with a collaborative filtering method for augmenting document feature spaces. The method produces feature spaces with an order of magnitude less features compared with a baseline bag-of-words feature selection method. Experiments on both real-world data and benchmark corpus indicate that our approach improves classification accuracy over the traditional methods for both Support Vector Machines and AdaBoost classifiers.
Citation:
Yang Song, Ding Zhou, Jian Huang, Isaac G. Councill, Hongyuan Zha, C. Lee Giles, "Boosting the Feature Space: Text Classification for Unstructured Data on the Web," icdm, pp.1064-1069, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.