2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) (2014)
Dec. 16, 2014 to Dec. 19, 2014
Yusuke Nagasaka , Tokyo Institute of Technology, Meguro, 152-8550, Japan
Akira Nukada , Tokyo Institute of Technology, Meguro, 152-8550, Japan
Satoshi Matsuoka , Tokyo Institute of Technology, Meguro, 152-8550, Japan
Scientific simulations often require solving extremely large sparse linear equations, whose dominant kernel is sparse matrix vector multiplication. On modern many-core processors such as GPU or MIC, the operation has been known to pose significant bottleneck and thus would result in extremely poor efficiency, because of limited processor-to-memory bandwidth and low cache hit ratio due to random access to the input vector. Our family of new sparse matrix formats for many-core processors significantly increases the cache hit ratio and thus performance by segmenting the matrix along the columns, dividing the work among the many core up to the internal cache capacity, and aggregating the result later on. Performance studies show that we achieve up to x3.0 speedup in SpMV and x1.68 in multi-node CG, compared to the best vendor libraries and competing new formats that have been recently proposed such as SELL-C-σ.
Sparse matrices, Graphics processing units, Vector processors, Matrix converters, Libraries, Hardware
Y. Nagasaka, A. Nukada and S. Matsuoka, "Cache-aware sparse matrix formats for Kepler GPU," 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, Taiwan, 2014, pp. 281-288.