The Community for Technology Leaders
2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
Beijing, China
July 13, 2014 to July 15, 2014
ISSN: 2168-3034
ISBN: 978-1-4799-3844-5
pp: 93-98
Recently, hybrid CPU/GPU cluster has been widely used to deal with compute-intensive problems, such as the subset-sum problem. The two-list algorithm is a well known approach to solve the problem. However, a hybrid MPI-CUDA dual-level parallelization of the algorithm on the cluster is not straightforward. The key challenge is how to allocate the most suitable workload to each node to achieve good load balancing between nodes and minimize the communication overhead. Therefore, this paper proposes an effective workload distribution scheme which aims to reasonably assign workload to each node. According to this scheme, an efficient MPI-CUDA parallel implementation of a two-list algorithm is presented. A series of experiments are conducted to compare the performance of the hybrid MPI-CUDA implementation with that of the best sequential CPU implementation, the single-node CPU-only implementation, the single-node GPU-only implementation, and the hybrid MPI-OpenMP implementation with same cluster configuration. The results show that the proposed hybrid MPI-CUDA implementation not only offers significant performance benefits but also has excellent scalability.
Graphics processing units, Clustering algorithms, Performance evaluation, Instruction sets, Parallel processing, Educational institutions, Computational modeling
Letian Kang, Lanjun Wan, Kenli Li, "Efficient Parallelization of a Two-List Algorithm for the Subset-Sum Problem on a Hybrid CPU/GPU Cluster", 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), vol. 00, no. , pp. 93-98, 2014, doi:10.1109/PAAP.2014.44
164 ms
(Ver 3.3 (11022016))