The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2010)
Atlanta, GA, USA
Apr. 19, 2010 to Apr. 23, 2010
ISBN: 978-1-4244-6442-5
pp: 1-11
Sameer Kumar , IBM T.J. Watson Research Center Yorktown Heights, NY, 10598
Philip Heidelberger , IBM T.J. Watson Research Center Yorktown Heights, NY, 10598
Dong Chen , IBM T.J. Watson Research Center Yorktown Heights, NY, 10598
Michael Hines , Department of Computer Science, Yale University, New Haven, CT, USA
We explore the multisend interface as a data mover interface to optimize applications with neighborhood collective communication operations. One of the limitations of the current MPI 2.1 standard is that the vector collective calls require counts and displacements (zero and non-zero bytes) to be specified for all the processors in the communicator. Further, all the collective calls in MPI 2.1 are blocking and do not permit overlap of communication with computation in the same thread of execution. However, multisends are non-blocking calls that permit overlap of computation and communication. We present the record replay persistent optimization to the multisend interface th at minimizes the processor overhead of initiating the collective. We present four different case studies with the multisend API on Blue Gene/P (i) 3D-FFT, (ii) 4D nearest neighbor exchange as used in Quantum Chromodynamics, (iii) NAMD and (iv) neural network simulator NEURON. Performance results show 1.9x speedup with 32

S. Kumar, P. Heidelberger, D. Chen and M. Hines, "Optimization of applications with non-blocking neighborhood collectives via multisends on the Blue Gene/P supercomputer," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, 2010, pp. 1-11.
88 ms
(Ver 3.3 (11022016))