loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Second IEEE International Conference on Data Mining (ICDM'02)
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
Maebashi City, Japan
December 09-December 12
ISBN: 0-7695-1754-4
Inderjit S. Dhillon, University of Texas
Yuqiang Guan, University of Texas
J. Kogan, University of Maryland Baltimore County
The k-means algorithm with cosine similarity, also known as the spherical k-means algorithm, is a popular method for clustering document collections. However, spherical k-means can often yield qualitatively poor results, especially when cluster sizes are small, say 25-30 documents per cluster, where it tends to get stuck at a local maximum far away from the optimal solution. In this paper, we present a local search procedure, which we call "first-variation" that refines a given clustering by incrementally moving data points between clusters, thus achieving a higher objective function value. An enhancement of first variation allows a chain of such moves in a Kernighan-Lin fashion and leads to a better local maximum. Combining the enhanced first-variation with spherical k-means yields a powerful "ping-pong" strategy that often qualitatively improves k-means clustering and is computationally efficient. We present several experimental results to high-light the improvement achieved by our proposed algorithm in clustering high-dimensional and sparse text data.
Citation:
Inderjit S. Dhillon, Yuqiang Guan, J. Kogan, "Iterative Clustering of High Dimensional Text Data Augmented by Local Search," icdm, pp.131, Second IEEE International Conference on Data Mining (ICDM'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.