loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2009 IEEE International Conference on Data Mining Workshops
SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering
Miami, Florida, USA
December 06-December 06
ISBN: 978-0-7695-3902-7
Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this paper, we propose Semi-supervised Impurity based Subspace Clustering (SISC) in conjunction with k-Nearest Neighbor approach, based on semi-supervised subspace clustering that considers the high dimensionality as well as the sparse nature of them in text data. SISC finds clusters in the subspaces of the high dimensional text data where each text document has fuzzy cluster membership. This fuzzy clustering exploits two factors - chi square statistic of the dimensions and the impurity measure within each cluster. Empirical evaluation on real world data sets reveals the effectiveness of our approach as it significantly outperforms other state-of-the-art text classification and subspace clustering algorithms.
Citation:
Mohammad Salim Ahmed, Latifur Khan, "SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering," icdmw, pp.1-6, 2009 IEEE International Conference on Data Mining Workshops, 2009
Usage of this product signifies your acceptance of the Terms of Use.