2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (2016)
Wuhan, Hubei, China
Dec. 13, 2016 to Dec. 16, 2016
Tapas is our new C++ programming framework for hierarchical algorithms such as N-body, on large scale heterogeneous supercomputers. Although N-body and their variants are widely used in scientific applications, their correct implementations are often difficult on such modern machines, as the algorithms are irregular, complex, and involve explicit task parallel programming over distributed nodes. Encapsulating the complexities in a library or a framework has been challenging due to irregular data access over massively distributed memory. Tapas solves this by converting the users clean implicit-style parallel program into an inspector-executor style code on heterogeneous multi-core, multi-node environment solely by the use of C++ template metaprogramming. A prototype implementation of the Fast Multipole Method on Tapas demonstrates a comparable performance and scaling as ExaFMM, the fastest hand-tuned implementation of FMM, as well as efficient usage of hundreds of GPUs. Specifically, the serial performance is 95% of ExaFMM, whereas the distributed-memory strong-scaling evaluation using up to 1500 CPU cores demonstrates 64% to 81% of the ExaFMM performance. The multi-GPU version of the Tapas-based FMM achieves a 5.15x speedup when executed on 100 nodes of TSUBAME2.5 with 300 GPUs.
Approximation algorithms, Algorithm design and analysis, C++ languages, Programming, Force, Standards, Libraries
K. Fukuda, M. Matsuda, N. Maruyama, R. Yokota, K. Taura and S. Matsuoka, "Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms," 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), Wuhan, Hubei, China, 2016, pp. 1100-1109.