The Community for Technology Leaders
Green Image
Issue No. 08 - Aug. (2013 vol. 24)
ISSN: 1045-9219
pp: 1602-1612
Russ B. Altman , Stanford University, Stanford
Vijay S. Pande , Stanford University, Stanford
Kai J. Kohlhoff , Stanford University, Stanford
ABSTRACT
We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.
INDEX TERMS
Kernel, Instruction sets, Graphics processing unit, Memory management, Sorting, Arrays, Vectors, parallel algorithms, Kernel, Instruction sets, Graphics processing unit, Memory management, Sorting, Arrays, Vectors, biology and genetics, Clustering algorithms, graphics processors
CITATION
Russ B. Altman, Vijay S. Pande, Kai J. Kohlhoff, "K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps", IEEE Transactions on Parallel & Distributed Systems, vol. 24, no. , pp. 1602-1612, Aug. 2013, doi:10.1109/TPDS.2012.234
195 ms
(Ver 3.1 (10032016))