The Community for Technology Leaders
2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016)
Shanghai, China
May 30, 2016 to June 1, 2016
ISBN: 978-1-5090-0804-9
pp: 313-318
Mi Li , School of Software Engineering, Tongji University, Shanghai, China
Jie Huang , School of Software Engineering, Tongji University, Shanghai, China
Jingpeng Wang , School of Software Engineering, Tongji University, Shanghai, China
ABSTRACT
Fast Search and Find of Density Peaks (FSFDP) is a newly proposed clustering algorithm that has already been successfully applied in many applications. However, this algorithm shows a dissatisfactory performance on large dataset due to the time-consuming calculation of the distance matrix and potentials. In this paper, we proposed a GPU-accelerated FSFDP with CUDA to improve its performance. Thread/block models and the shared memory usage are dedicatedly designed to maximize the utilization of GPUs' hardware resources, and a merge accumulation algorithm based on the odd and even positions of an array is introduced as well. Experimental results show that our parallel implementation of FSFDP can reach a 4.39X and a 15.75X speedup for the calculation of the distance matrix and potentials respectively compared to the serial program on a single CPU core. Higher speedup can be expected for data of larger scales until the device limits are reached. Besides, CUDA stream mechanism is also employed and extra time savings can be obtained by hiding the corresponding memory latency of multiple kernels in a two-way streams' scheduling. Moreover, we evaluate our GPU-based implementation on GPU clusters of 9 nodes and compared to one GPU node, the program can achieve a further 7.55X speedup.
INDEX TERMS
Graphics processing units, Instruction sets, Kernel, Clustering algorithms, Entropy, Software algorithms, Acceleration
CITATION

M. Li, J. Huang and J. Wang, "Paralleled Fast Search and Find of Density Peaks clustering algorithm on GPUs with CUDA," 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Shanghai, China, 2016, pp. 313-318.
doi:10.1109/SNPD.2016.7515918
175 ms
(Ver 3.3 (11022016))