2012 41st International Conference on Parallel Processing Workshops (2012)
Pittsburgh, PA, USA USA
Sept. 10, 2012 to Sept. 13, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPPW.2012.23
High performance computing systems are increasingly incorporating hybrid CPU/GPU nodes to accelerate the rate at which floating point calculations can be performed for scientific applications. Currently, a key challenge is adapting scientific applications to such systems when the underlying computations are sparse, such as sparse linear solvers for the simulation of partial differential equation models using semi-implicit methods. Now, a key bottleneck is sparse triangular solution for solvers such as preconditioned conjugate gradients (PCG). We show that sparse triangular solution can be effectively mapped to GPUs by extracting very large degrees of fine-grained parallelism using graph coloring. We develop simple performance models to predict these effects at intersection of the data and hardware attributes and we evaluate our scheme on a Nvidia Tesla M2090 GPU relative to the level set scheme developed at NVIDIA. Our results indicate that our approach significantly enhances the available fine-grained parallelism to speed-up PCG iteration time compared to the NVIDIA scheme, by a factor with a geometric mean of 5.41 on a single GPU, with speedups as high as 63 in some cases.
Color, Sparse matrices, Graphics processing unit, Level set, Concurrent computing, Parallel processing, Image color analysis
B. Suchoski, C. Severn, M. Shantharam and P. Raghavan, "Adapting Sparse Triangular Solution to GPUs," 2012 41st International Conference on Parallel Processing Workshops(ICPPW), Pittsburgh, PA, USA USA, 2012, pp. 140-148.