Parallel and Distributed Systems, International Conference on (2011)
Dec. 7, 2011 to Dec. 9, 2011
Sparse Matrix-Vector multiplication (SpMV) is one of the most significant yet challenging issues in computational science area. It is a memory-bound application whose performance mostly depends on the input matrix and the underlying architecture. Many researchers have paid more attentions on exploring a variety of optimization techniques to SpMV. One of the most promising respects is how to adapt the storage format to satisfy the underlying architecture. Alterative storage formats can largely lessen memory pressure, however, the computational resources are often underutilized. Therefore, a new storage format, which is called Compressed Sparse Row with Segmented Interleave Combination (SIC), is proposed. Stemming from Compressed Sparse Row format (CSR), SIC format employs an interleave combination pattern that combines certain amount of CSR rows to form a new SIC row. In order to further improve performance, segmented processing is also brought in. According to the empirical data, we also develop an automatic SIC-based SpMV suitable for all the matrices. Experimental results show that our approach outperforms the NVIDIA CSR vector kernel, achieving up to 12.6 Ã- speedup. It also demonstrates a comparable performance with the Hybrid format, even with the highest 2.89 Ã- speedup.
Sparse Matrix-Vector Multiplication, GPU, Compress Sparse Row, Interleaved Row Combination, Segmented Processing
Z. Shao, H. Jin, J. Zeng, R. Zheng, K. Hu and X. Feng, "Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs," Parallel and Distributed Systems, International Conference on(ICPADS), Tainan, Taiwan, 2011, pp. 165-172.