Subscribe
Issue No.10 - Oct. (2013 vol.24)
pp: 1930-1940
Vasileios Karakasis , National Technical University of Athens, Zografou
Theodoros Gkountouvas , National Technical University of Athens, Zografou
Kornilios Kourtis , ETH Zürich, Zürich
Georgios Goumas , National Technical University of Athens, Zografou
Nectarios Koziris , National Technical University of Athens, Zografou
ABSTRACT
Sparse matrix-vector multiplication ($({\rm SpM}\times{\rm V})$) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the $({\rm SpM}\times{\rm V})$ kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present a compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix. Relying on aggressive compression techniques of the sparse matrix's indexing structure, CSX is able to considerably reduce the memory footprint of a sparse matrix, alleviating the pressure to the memory subsystem. In a diverse set of sparse matrices, CSX was able to provide a more than 40 percent average performance improvement over the standard CSR format in SMP architectures and surpassed 20 percent improvement in NUMA systems, significantly outperforming other CSR alternatives. Additionally, it was able to adapt successfully to the nonzero element structure of the considered matrices, exhibiting very stable performance. Finally, in the context of a "real-lifeâ multiphysics simulation software, CSX accelerated the $({\rm SpM}\times{\rm V})$ component nearly 40 percent and the total solver time approximately 15 percent.
INDEX TERMS
Sparse matrices, Kernel, Encoding, Indexes, Optimization, Vectors, Computer architecture, data compression, Sparse Matrix-Vector Multiplication, multicore optimizations
CITATION
Vasileios Karakasis, Theodoros Gkountouvas, Kornilios Kourtis, Georgios Goumas, Nectarios Koziris, "An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 10, pp. 1930-1940, Oct. 2013, doi:10.1109/TPDS.2012.290
REFERENCES
 [1] K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, and K.A. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley," Technical Report UCB/EECS-2006-183, Univ. of California, Berkeley, 2006. [2] G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris, "Performance Evaluation of the Sparse Matrix-Vector Multiplication on Modern Architectures," J. Supercomputing, vol. 50, no. 1, pp. 36-77, 2009. [3] S. Williams, L. Oilker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms," Proc. ACM/IEEE Conf. Supercomputing, 2007. [4] S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures," Comm. ACM - A Direct Path to Dependable Software, vol. 52, no. 4, pp. 65-76, Apr. 2009. [5] Y. Saad, Numerical Methods for Large Eigenvalue Problems. Manchester Univ. Press, 1992. [6] R.C. Agarwal, F.G. Gustavson, and M. Zubair, "A High Performance Algorithm Using Pre-Processing for the Sparse Matrix-Vector Multiplication," Proc. ACM/IEEE Conf. Supercomputing, pp. 32-41, 1992. [7] A. Pinar and M.T. Heath, "Improving Performance of Sparse Matrix-Vector Multiplication," Proc. ACM/IEEE Conf. Supercomputing, 1999. [8] E.-J. Im and K.A. Yelick, "Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY," Proc. Int'l Conf. Computational Sciences - Part I, pp. 127-136, 2001. [9] R. Geus and S. Röllin, "Towards a Fast Parallel Sparse Matrix-Vector Multiplication," Parallel Computing, vol. 27, pp. 883-896, 2001. [10] K. Kourtis, G. Goumas, and N. Koziris, "Optimizing Sparse Matrix-Vector Multiplication Using Index and Value Compression," Proc. Fifth Conf. Computing Frontiers, 2008. [11] C. Lattner and V. Adve, "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation," Proc. Int'l Symp. Code Generation and Optimization (CGO '04), http:/www. llvm.org/, 2004. [12] M. Lyly, J. Ruokolainen, and E. Järvinen, "ELMER - A Finite Element Solver for Multiphysics," CSC Report Scientific Computing, http://www.csc.fi/english/pageselmer, 1999. [13] Y. Saad, Iterative Methods for Sparse Linear Systems. SIAM, 2003. [14] M. Belgin, G. Back, and C.J. Ribbens, "Pattern-Based Sparse Matrix Representation for Memory-Efficient SMVM Kernels," Proc. 23rd Int'l Conf. Supercomputing (ICS '09), pp. 100-109, 2009. [15] K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris, "CSX: An Extended Compression Format for SpMV on Shared Memory Systems," Proc. 16th ACM SIGPLAN Ann. Symp. Principles and Practice of Parallel Programming (PPoPP '11), pp. 247-256, 2011. [16] V. Karakasis, G. Goumas, and N. Koziris, "Exploring the Effect of Block Shapes on the Performance of Sparse Kernels," Proc. IEEE Int'l Symp. Parallel and Distributed Processing, pp. 1-8, 2009. [17] T. Davis and Y. Hu, "The University of Florida Sparse Matrix Collection," ACM Trans. Math. Software, vol. 38, pp. 1-25, 2011. [18] W. Tinney and J. Walker, "Direct Solutions of Sparse Network Equations by Optimally Ordered Triangular Factorization," Proc. IEEE, vol. 55, no. 11, pp. 1801-1809, Nov. 1967. [19] U.W. Pooch and A. Nieder, "A Survey of Indexing Techniques for Sparse Matrices," ACM Computing Surveys, vol. 5, pp. 109-133, 1973. [20] Y. Saad, "SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations," 1994. [21] E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization Framework for Sparse Matrix Kernels," Int'l J. High Performance Computing Applications, vol. 18, pp. 135-158, 2004. [22] R.W. Vuduc and H.-J. Moon, "Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure," Proc. First Int'l Conf. High Performance Computing and Comm., pp. 807-816, 2005. [23] R. Vuduc, J.W. Demmel, K.A. Yelick, S. Kamil, R. Nishtala, and B. Lee, "Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply," Proc. ACM/IEEE Conf. Supercomputing, pp. 1-35, 2002. [24] R. Vuduc, J.W. Demmel, and K.A. Yelick, "OSKI: A Library of Automatically Tuned Sparse Matrix Kernels," J. Physics: Conf. Series, vol. 16, no. 521, 2005. [25] A. Buluç, J.T. Fineman, M. Frigo, J.R. Gilbert, and C.E. Leiserson, "Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks," Proc. 21st Ann. Symp. Parallelism in Algorithms and Architectures (SPAA '09), pp. 233-244, 2009. [26] A. Buluç, S. Williams, L. Oliker, and J. Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication," Proc. IEEE Int'l Parallel and Distributed Processing Symp., pp. 721-733, 2011. [27] J. Willcock and A. Lumsdaine, "Accelerating Sparse Matrix Computations via Data Compression," Proc. 20th Ann. Int'l Conf. Supercomputing, pp. 307-316, 2006. [28] K. Kourtis, G. Goumas, and N. Koziris, "Exploiting Compression Opportunities to Improve SpMxV Performance on Shared Memory Systems," ACM Trans. Architecture and Code Optimization, vol. 7, no. 3,article 16, 2010.