The Community for Technology Leaders
ABSTRACT
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each BG/Q node having 68 hardware threads, hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, are ideal and will enable applications to achieve high throughput on BG/Q. With such unprecedented massive parallelism and scale, this paper is a groundbreaking effort to explore the design challenges for designing a communication library that can match and exploit such massive parallelism In particular, we present the Parallel Active Messaging Interface (PAMI) library as our BG/Q library solution to the many challenges that come with a machine at such scale. PAMI provides (1) novel techniques to partition the application communication overhead into many contexts that can be accelerated by communication threads, (2) client and context objects to support multiple and different programming paradigms, (3) lockless algorithms to speed up MPI message rate, and (4) novel techniques leveraging the new BG/Q architectural features such as the scalable atomic primitives implemented in the L2 cache, the highly parallel hardware messaging unit that supports both point-to-point and collective operations, and the collective hardware acceleration for operations such as broadcast, reduce, and all reduce. We experimented with PAMI on 2048 BG/Q nodes and the results show high messaging rates as well as low latencies and high throughputs for collective communication operations.
INDEX TERMS
Message systems, Context, Hardware, Programming, Libraries, Parallel processing, Acceleration, Message Rate, MPI, Message Passing, Blue Gene, Collective Communication
CITATION

J. Ratterman et al., "PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer," Parallel and Distributed Processing Symposium, International(IPDPS), Shanghai, China China, 2012, pp. 763-773.
doi:10.1109/IPDPS.2012.73
173 ms
(Ver 3.3 (11022016))