loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth IEEE International Conference on Data Mining (ICDM'04)
Fast and Exact Out-of-Core K-Means Clustering
Brighton, United Kingdom
November 01-November 04
ISBN: 0-7695-2142-8
Anjan Goswami, Ohio State University
Ruoming Jin, Ohio State University
Gagan Agrawal, Ohio State University
Clustering has been one of the most widely studied topics in data mining and k-means clustering has been one of the popular clustering algorithms. K-means requires several passes on the entire dataset, which can make it very expensive for large disk-resident datasets. In view of this, a lot of work has been done on various approximate versions of k-means, which require only one or a small number of passes on the entire dataset.
In this paper, we present a new algorithm which typically requires only one or a small number of passes on the entire dataset, and provably produces the same cluster centers as reported by the original k-means algorithm. The algorithm uses sampling to create initial cluster centers, and then takes one or more passes over the entire dataset to adjust these cluster centers. We provide theoretical analysis to show that the cluster centers thus reported are the same as the ones computed by the original k-means algorithm. Experimental results from a number of real and synthetic datasets show speedup between a factor of 2 and 4.5, as compared to k-means.
Citation:
Anjan Goswami, Ruoming Jin, Gagan Agrawal, "Fast and Exact Out-of-Core K-Means Clustering," icdm, pp.83-90, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.