The Community for Technology Leaders
2015 International Conference on Parallel Architecture and Compilation (PACT) (2015)
San Francisco, CA, USA
Oct. 18, 2015 to Oct. 21, 2015
ISSN: 1089-795X
ISBN: 978-1-4673-9524-3
pp: 150-162
Data movement is a critical bottleneck for futuregenerations of parallel systems. The class of .5Dcommunication-avoiding algorithms were developed to addressthis bottleneck. These algorithms reduce communication andprovide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoidinglibraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed usingsymbolic data sizes and numbers of processors. It supportsthe expression of data movement and communication throughhigh-level global operations such as TILT and CSHIFT as wellas through element-wise copy operations. With the latter, wraparoundcommunication patterns can also be achieved usingsubscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communicationand computation present in a .5D algorithm. After partitioning data and computation, it inserts point-topointand collective communication as needed. Maunam alsoanalyzes data dependence patterns and data layouts to identifyreductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplicationrunning on 4096 cores of a Cray XC30 supercomputerachieves 59 TFlops/s (76% of the machine peak). Ourgenerated parallel code achieves 91% of the performance of ahand-coded version.
Program processors, Three-dimensional displays, Algorithm design and analysis, Libraries, Arrays, Partitioning algorithms, Optimization

K. Murthy and J. Mellor-Crummey, "Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems," 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco, CA, USA, 2015, pp. 150-162.
103 ms
(Ver 3.3 (11022016))