The Community for Technology Leaders
2012 IEEE International Conference on Cluster Computing (2012)
Beijing, China China
Sept. 24, 2012 to Sept. 28, 2012
ISBN: 978-1-4673-2422-9
pp: 266-274
Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU auto tuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.
Graphics processing unit, Kernel, Libraries, Tuning, Sparse matrices, Instruction sets, Computer architecture, GPU, autotuning, stencil, CUDA

A. Mametjanov, D. Lowell, C. Ma and B. Norris, "Autotuning Stencil-Based Computations on GPUs," 2012 IEEE International Conference on Cluster Computing(CLUSTER), Beijing, China China, 2012, pp. 266-274.
95 ms
(Ver 3.3 (11022016))