Subscribe
Issue No.08  Aug. (2013 vol.24)
pp: 16131621
J. Kurzak , Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
P. Luszczek , Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
M. Faverge , Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
J. Dongarra , Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242
ABSTRACT
LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion between the computational power of the CPUs, compared to the GPUs, and in the meager bandwidth of the communication link between their memory systems. An additional challenge comes from the complexity of the memorybound and synchronizationrich nature of the panel factorization component of the block LU algorithm, imposed by the use of partial pivoting. The challenges are tackled with the use of a data layout geared toward complex memory hierarchies, autotuning of GPU kernels, finegrain parallelization of memorybound CPU operations and dynamic scheduling of tasks to different devices. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
INDEX TERMS
Graphics processing unit, Layout, Kernel, Dynamic scheduling, Libraries, Plasmas,GPU, Graphics processing unit, Layout, Tiles, Kernel, Dynamic scheduling, Libraries, Plasmas, accelerator, Gaussian elimination, LU factorization, partial pivoting, multicore, manycore
CITATION
J. Kurzak, P. Luszczek, M. Faverge, J. Dongarra, "LU Factorization with Partial Pivoting for a Multicore System with Accelerators", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 8, pp. 16131621, Aug. 2013, doi:10.1109/TPDS.2012.242
REFERENCES
