The Community for Technology Leaders
2014 21st International Conference on High Performance Computing (HiPC) (2014)
Goa, India
Dec. 17, 2014 to Dec. 20, 2014
ISBN: 978-1-4799-5975-4
pp: 1-10
Jeff Daily , Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA
Abhinav Vishnu , Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA
Bruce Palmer , Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA
Hubertus van Dam , Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA
Darren Kerbyson , Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA
ABSTRACT
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes get, put, and atomic memory operations. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA, as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speedup on 1008 processors over the other MPI-based designs.
INDEX TERMS
Electronics packaging, Semantics, Protocols, Message systems, Runtime, Synchronization, Computational modeling
CITATION

J. Daily, A. Vishnu, B. Palmer, H. van Dam and D. Kerbyson, "On the suitability of MPI as a PGAS runtime," 2014 21st International Conference on High Performance Computing (HiPC)(HIPC), Goa, India, 2014, pp. 1-10.
doi:10.1109/HiPC.2014.7116712
82 ms
(Ver 3.3 (11022016))