A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (2013)
Cambridge, MA, USA USA
May 20, 2013 to May 24, 2013
The problem size of the stencil computation on GPU is limited by the GPU memory capacity, which is typically smaller than that of host memory. This paper proposes and evaluates a multi-level optimization method for stencil computation to achieve both larger problem size than GPU memory and high performance. It is based on the temporal blocking method, which has been proposed to improve memory access locality of stencil computation. It applies temporal blocking to 2 layers to improve locality of computation. Then it reuses former result to solve redundant problem. Furthermore, it parallels computation with communication by 2 additional buffers. Evaluation of 7-point stencil simulation on 3D domain shows that our new method achieves 16.74 times better performance than naive method and 1.35 times better performance than other methods on average.
temporal blocking, GPU memory capacity, multi-level optimization, stencil computation
G. Jin, T. Endo and S. Matsuoka, "A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU," 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum(IPDPSW), Cambridge, MA, USA USA, 2013, pp. 1080-1087.