The Community for Technology Leaders
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Haifa, Israel
Sept. 11, 2016 to Sept. 15, 2016
ISBN: 978-1-5090-5308-7
pp: 99-111
Prashant Singh Rawat , Computer Science and Engineering, The Ohio State University, United States of America
Changwan Hong , Computer Science and Engineering, The Ohio State University, United States of America
Mahesh Ravishankar , Nvidia Corporation, Redmond, Washington, United States of America
Vinod Grover , Nvidia Corporation, Redmond, Washington, United States of America
Louis-Noel Pouchet , Computer Science and Engineering, The Ohio State University, United States of America
Atanas Rountev , Computer Science and Engineering, The Ohio State University, United States of America
P. Sadayappan , Computer Science and Engineering, The Ohio State University, United States of America
ABSTRACT
Computations involving successive application of 3D stencil operators are widely used in many application domains, such as image processing, computational electromagnetics, seismic processing, and climate modeling. Enhancement of temporal and spatial locality via tiling is generally required in order to overcome performance bottlenecks due to limited bandwidth to global memory on GPUs. However, the low shared memory capacity on current GPU architectures makes effective tiling for 3D stencils very challenging - several previous domain-specific compilers for stencils have demonstrated very high performance for 2D stencils, but much lower performance on 3D stencils. In this paper, we develop an effective resource-constraint-driven approach for automated GPU code generation for stencils. We present a fusion technique that judiciously fuses stencil computations to minimize data movement, while controlling computational redundancy and maximizing resource usage. The fusion model subsumes time tiling of iterated stencils, and can be easily adapted to different GPU architectures. We integrate the fusion model into a code generator that makes effective use of scarce shared memory and registers to achieve high performance. The effectiveness of the automated model-driven code generator is demonstrated through experimental results on a number of benchmarks, comparing against various previously developed GPU code generators.
INDEX TERMS
Three-dimensional displays, Graphics processing units, Generators, Memory management, Instruction sets, Two dimensional displays, Registers
CITATION
Prashant Singh Rawat, Changwan Hong, Mahesh Ravishankar, Vinod Grover, Louis-Noel Pouchet, Atanas Rountev, P. Sadayappan, "Resource conscious reuse-driven tiling for GPUs", 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), vol. 00, no. , pp. 99-111, 2016, doi:10.1145/2967938.2967967
83 ms
(Ver 3.3 (11022016))