2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (2013)
Cambridge, MA, USA USA
May 20, 2013 to May 24, 2013
This paper presents a design of scalable Partitioned Global Address Space (PGAS) communication subsystems on recently proposed Blue Gene/Q architecture. The proposed design provides an in-depth modeling of communication infrastructure using Parallel Active Messaging Interface(PAMI). The communication infrastructure is used to design time-space efficient communication protocols for frequently used data-types (contiguous, uniformly non-contiguous) with Remote Direct Memory Access (RDMA) get/put primitives. The proposed design accelerates load balance counters by using asynchronous threads, which are required due to the missing network hardware support for generic Atomic Memory Operations (AMOs). Under the proposed design, the synchronization traffic is reduced by tracking conflicting memory accesses in distributed memory with a slight increment in space complexity. An evaluation with simple communication benchmarks show a adjacent node get latency of 2.89us and peak bandwidth of 1775 MB/s resulting in 99% communication efficiency. The evaluation shows a reduction in the execution time by up to 30% for NWChem self consistent field calculation on 4096 processes using the proposed asynchronous thread based design.
RDMA, PGAS, Blue Gene/Q, Communication
A. Vishnu, D. J. Kerbyson, K. Barker and H. van Dam, "Building Scalable PGAS Communication Subsystem on Blue Gene/Q," 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum(IPDPSW), Cambridge, MA, USA USA, 2013, pp. 825-833.