IEEE Transactions on Parallel and Distributed Systems (TPDS) has moved to the OnlinePlus publication model.

From the October 2014 Issue

GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation

By Hao Wang, Sreeram Potluri, Devendar Bureddy, Carlos Rosales, and Dhabaleswar K. (DK) Panda

Free Featured ArticleDesigning high-performance and scalable applications on GPU clusters requires tackling several challenges. The key challenge is the separate host memory and device memory, which requires programmers to use multiple programming models, such as CUDA and MPI, to operate on data in different memory spaces. This challenge becomes more difficult to tackle when non-contiguous data in multidimensional structures is used by real-world applications. These challenges limit the programming productivity and the application performance. We propose the GPU-Aware MPI to support data communication from GPU to GPU using standard MPI. It unifies the separate memory spaces, and avoids explicit CPU-GPU data movement and CPU/GPU buffer management. It supports all MPI datatypes on device memory with two algorithms: a GPU datatype vectorization algorithm and a vector based GPU kernel data pack and unpack algorithm. A pipeline is designed to overlap the non-contiguous data packing and unpacking on GPUs, the data movement on the PCIe, and the RDMA data transfer on the network. We incorporate our design with the open-source MPI library MVAPICH2 and optimize a production application: the multiphase 3D LBM. Besides the increase of programming productivity, we observe up to 19.9 percent improvement in application-level performance on 64 GPUs of the Oakley supercomputer.

download PDF View the PDF of this article      csdl View this issue in the digital library

Editorials and Announcements



Guest Editorials

Reviewers List

Annual Index

Access recently published TPDS articles

RSS Subscribe to the RSS feed of latest TPDS content added to the digital library.

Mail Sign up for the Transactions Connection newsletter.

Listen to the OnlinePlus Podcast: Computer Society Publishing—two more titles migrate to OnlinePlus™ in 2012.

In this podcast, VP of Publications, David Alan Grier talks about Transactions on Mobile Computing and Transactions on Parallel and Distributed Systems migrating to OnlinePlus™.

TPDS is indexed in ISI

IEEE Transactions on Parallel and Distributed Systems (TPDS) is a scholarly archival journal published monthly. Parallelism and distributed computing are foundational research and technology to rapidly advance computer systems and their applications.
Read the full scope of TPDS