2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Sept. 11, 2016 to Sept. 15, 2016
Chandan Reddy , INRIA and École Normale Supérieure, Paris, France
Michael Kruse , INRIA and École Normale Supérieure, Paris, France
Albert Cohen , INRIA and École Normale Supérieure, Paris, France
Reductions are common in scientific and data-crunching codes, and a typical source of bottlenecks on massively parallel architectures such as GPUs. Reductions are memory-bound, and achieving peak performance involves sophisticated optimizations. There exist libraries such as CUB and Thrust providing highly tuned implementations of reductions on GPUs. However, library APIs are not flexible enough to express user-defined reductions on arbitrary data types and array indexing schemes. Languages such as OpenACC provide declarative syntax to express reductions. Such approaches support a limited range of reduction operators and do not facilitate the application of complex program transformations in presence of reductions. We present language constructs that let a programmer express arbitrary reductions on user-defined data types matching the performance of tuned library implementations. We also extend a polyhedral compilation flow to process these user-defined reductions, enabling optimizations such as the fusion of multiple reductions, combining reductions with other loop transformations, and optimizing data transfers and storage in the presence of reductions. We implemented these language constructs and compilation methods in the PPCG framework and conducted experiments on multiple GPU targets. For single reductions the generated code performs on par with highly tuned libraries, and for multiple reductions it significantly outperforms both libraries and OpenACC on all platforms.
Optimization, Libraries, Graphics processing units, Arrays, Syntactics
Chandan Reddy, Michael Kruse, Albert Cohen, "Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs", 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), vol. 00, no. , pp. 87-97, 2016, doi:10.1145/2967938.2967950