loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Third IEEE International Conference on Data Mining (ICDM'03)
Information Theoretic Clustering of Sparse Co-Occurrence Data
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
Inderjit S. Dhillon, University of Texas, Austin
Yuqiang Guan, University of Texas, Austin
A novel approach to clustering co-occurrence data poses it as an optimization problem in information theory which minimizes the resulting loss in mutual information. A divisive clustering algorithm that monotonically reduces this loss function was recently proposed. In this paper we show that sparse high-dimensional data presents special challenges which can result in the algorithm getting stuck at poor local minima. We propose two solutions to this problem: (a) a "prior" to overcome infinite relative entropy values as in the supervised Naive Bayes algorithm, and (b) local search to escape local minima. Finally, we combine these solutions to get a robust algorithm that is computationally efficient. We present experimental results to show that the proposed method is effective in clustering document collections and outperforms previous information-theoretic clustering approaches.
Citation:
Inderjit S. Dhillon, Yuqiang Guan, "Information Theoretic Clustering of Sparse Co-Occurrence Data," icdm, pp.517, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.