2008 37th International Conference on Parallel Processing (2008)
Sept. 9, 2008 to Sept. 11, 2008
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2008.83
All-to-all communication is a well known performance bottleneck for many applications, such as the ones that use the Fast-Fourier-Transform (FFT) algorithm. We analyze the performance of all-to-all communication on the BlueGene/L torus interconnect that has link contention even for all-to-all operations with short messages. We observed that the performance of all-to-all depends on the shape of the processor partition. We present a performance analysis of all-to-all on partitions of various shapes.??We then present optimization schemes that substantially improve the performance of all-to-all with short and large messages.In particular, throughput improved from 64% to over 99% of peak on the 65,536 (64X32X32) node Blue Gene/L machine at the Lawrence Livermore National Lab. We show the impact of the all-to-all performance optimizations in 1-D and 3-D FFT benchmarks. We achieved a performance of over 2.8 TF for the HPC Challenge 1D FFT benchmark with our optimized all-to-all.
Collective communication, All-to-all communication, Blue Gene. torus, network, adaptive routing, deterministic routing, network congestion
R. Garg, S. Kumar, Y. Sabharwal and P. Heidelberger, "Optimization of All-to-All Communication on the Blue Gene/L Supercomputer," 2008 37th International Conference on Parallel Processing(ICPP), vol. 00, no. , pp. 320-329, 2008.