2017 46th International Conference on Parallel Processing (ICPP) (2017)

Bristol, United Kingdom

Aug. 14, 2017 to Aug. 17, 2017

ISSN: 2332-5690

ISBN: 978-1-5386-1042-8

pp: 91-100

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2017.18

ABSTRACT

We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves. All kernels heavily exploit the registers of the graphics processing unit (GPU) in order to deliver high performance for small problems. The development of these kernels is motivated by the need for tackling this embarrasingly-parallel scenario in the context of block-Jacobi preconditioning that is relevant for the iterative solution of sparse linear systems.

INDEX TERMS

Graphics processing units, Kernel, Linear systems, Jacobian matrices, Sparse matrices, Parallel processing

CITATION

H. Anzt, J. Dongarra, G. Flegar and E. S. Quintana-Orti, "Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,"

*2017 46th International Conference on Parallel Processing (ICPP)*, Bristol, United Kingdom, 2017, pp. 91-100.

doi:10.1109/ICPP.2017.18

