The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 505-506
Tejaswi Agarwal , University of Missouri - Columbia
Michela Becchi , University of Missouri - Columbia
ABSTRACT
In the last few years, GPUs have become an integral part of HPC clusters. To test these heterogeneous CPU-GPU systems, we designed a hybrid CUDA-MPI benchmark suite that consists of three communication- and compute-intensive applications: Matrix Multiplication (MM), Needleman-Wunsch (NW) and the ADFA compression algorithm [1]. The main goal of this work is to characterize these workloads on CPU-GPU clusters. Our benchmark applications are designed to allow cluster administrators to identify bottlenecks in the cluster, to decide if scaling applications to multiple nodes would improve or decrease overall throughput and to design effective scheduling policies. Our experiments show that inter-node communication can significantly degrade the throughput of communication-intensive applications. We conclude that the scalability of the applications depends primarily on two factors: the cluster configuration and the applications characteristics.
INDEX TERMS
Graphics processing units, Benchmark testing, Memory management, Data transfer, Throughput, Compression algorithms, Scalability,GPU, Benchmark, CUDA-MPI, clusters
CITATION
Tejaswi Agarwal, Michela Becchi, "Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 505-506, 2014, doi:10.1145/2628071.2671423
88 ms
(Ver 3.3 (11022016))