This Article 
 Bibliographic References 
 Add to: 
K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps
Aug. 2013 (vol. 24 no. 8)
pp. 1602-1612
Kai J. Kohlhoff, Stanford University, Stanford
Vijay S. Pande, Stanford University, Stanford
Russ B. Altman, Stanford University, Stanford
We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.
Index Terms:
Kernel,Instruction sets,Graphics processing unit,Memory management,Sorting,Arrays,Vectors,parallel algorithms,Kernel,Instruction sets,Graphics processing unit,Memory management,Sorting,Arrays,Vectors,biology and genetics,Clustering algorithms,graphics processors
Kai J. Kohlhoff, Vijay S. Pande, Russ B. Altman, "K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps," IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 8, pp. 1602-1612, Aug. 2013, doi:10.1109/TPDS.2012.234
Usage of this product signifies your acceptance of the Terms of Use.