The Community for Technology Leaders
2015 International Conference on Parallel Architecture and Compilation (PACT) (2015)
San Francisco, CA, USA
Oct. 18, 2015 to Oct. 21, 2015
ISSN: 1089-795X
ISBN: 978-1-4673-9524-3
pp: 150-162
ABSTRACT
Data movement is a critical bottleneck for futuregenerations of parallel systems. The class of .5Dcommunication-avoiding algorithms were developed to addressthis bottleneck. These algorithms reduce communication andprovide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoidinglibraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed usingsymbolic data sizes and numbers of processors. It supportsthe expression of data movement and communication throughhigh-level global operations such as TILT and CSHIFT as wellas through element-wise copy operations. With the latter, wraparoundcommunication patterns can also be achieved usingsubscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communicationand computation present in a .5D algorithm. After partitioning data and computation, it inserts point-topointand collective communication as needed. Maunam alsoanalyzes data dependence patterns and data layouts to identifyreductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplicationrunning on 4096 cores of a Cray XC30 supercomputerachieves 59 TFlops/s (76% of the machine peak). Ourgenerated parallel code achieves 91% of the performance of ahand-coded version.
INDEX TERMS
Program processors, Three-dimensional displays, Algorithm design and analysis, Libraries, Arrays, Partitioning algorithms, Optimization,parallel code generation, Compilers, polyhedral methods, reductions, modulo operations, optimization
CITATION
Karthik Murthy, John Mellor-Crummey, "Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems", 2015 International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 150-162, 2015, doi:10.1109/PACT.2015.41
89 ms
(Ver 3.3 (11022016))