The Community for Technology Leaders
2013 IEEE International Conference on Cluster Computing (CLUSTER) (2013)
Indianapolis, IN, USA
Sept. 23, 2013 to Sept. 27, 2013
ISBN: 978-1-4799-0898-1
pp: 1-8
Guanghao Jin , Tokyo Institute of Technology, JST-CREST, Japan
Toshio Endo , Tokyo Institute of Technology, JST-CREST, Japan
Satoshi Matsuoka , Tokyo Institute of Technology, NII, JST-CREST, Japan
ABSTRACT
The problem size of the stencil computation on GPU cluster is limited by the memory capacity GPUs, which is typically smaller than that of host memories. This paper proposes and evaluates parallel optimization method for stencil computation to achieve scalability, larger problem size than the memory capacity of GPUs and high performance. It uses 2D decomposition to achieve scalability over GPUs. Then it enables bigger sub-domain on each GPU to achieve bigger problem size. It applies temporal blocking method to improve memory access locality of stencil computation and reuses former result to solve redundant problem to get higher performance. Evaluation of stencil simulation on 3D domain shows that our new method for 7-point and 19-point on GPUs achieves good scalability which is 1.45 times and 1.72 times better than other methods on average.
INDEX TERMS
temporal blocking, stencil computation, GPU cluster, memory capacity, parallel optimization
CITATION

G. Jin, T. Endo and S. Matsuoka, "A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs," 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, IN, USA USA, 2014, pp. 1-8.
doi:10.1109/CLUSTER.2013.6702633
95 ms
(Ver 3.3 (11022016))