The Community for Technology Leaders
2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016)
Shanghai, China
May 30, 2016 to June 1, 2016
ISBN: 978-1-5090-0804-9
pp: 307-312
Mi Li , School of Software Engineering, Tongji University, Shanghai, China
Jie Huang , School of Software Engineering, Tongji University, Shanghai, China
Jingpeng Wang , School of Software Engineering, Tongji University, Shanghai, China
ABSTRACT
Exorbitant computation cost hinders the practical application of recommendation algorithm, especially in time-critical application scenario. Although experiments show that recommendation algorithm based on an integrated diffusion on user-item-tag tripartite graphs can significantly improve accuracy, diversification and novelty of recommendation, it is also very time-consuming. Therefore, a parallel solution is frequently needed to improve the performance of the algorithm. This paper explicitly presents the parallel implementation and optimizations of diffusion-based recommendation on weighted tripartite graphs algorithm using Compute Unified Device Architecture (CUDA) and related optimization solutions including shared memory, stream scheduling and GPU cluster optimization. Compared to the algorithm running on a single CPU core, the unoptimized GPU kernel can achieve 153.9 speedup on average with the input dataset consists of 30000 records on GTX 980. With shared memory applied, the time cost on memory access saves about 50% on dataset of 90000 records and with 2 way streams scheduling, the kernel's performance improves about 7% ? 13%. Based on the optimized kernel, we evaluate the performance of the algorithm with customized socket communication mechanism on GPU clusters. And compared to a single GPU node, we achieve 7.55 speedup on clusters of 9 GPUs when recommending for 8000 users. Besides this, the speedup of GPU clusters is also 26.1 times of the speedup of our CPU clusters of 9 nodes and 1586.28 times of serial algorithm on one CPU core. It proves that GPU technology can dramatically improve the algorithm's performance
INDEX TERMS
Graphics processing units, Kernel, Clustering algorithms, Bipartite graph, Algorithm design and analysis, Instruction sets, Arrays
CITATION

M. Li, J. Huang and J. Wang, "Accelerated diffusion-based recommendation algorithm on tripartite graphs with GPU clusters," 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Shanghai, China, 2016, pp. 307-312.
doi:10.1109/SNPD.2016.7515917
186 ms
(Ver 3.3 (11022016))