The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2004)
Santa Fe, New Mexico
Apr. 26, 2004 to Apr. 30, 2004
ISBN: 0-7695-2132-0
pp: 90b
Pyrrhos Stathis , Delft University of Technology
Dmitry Cheresiz , Delft University of Technology
Stamatis Vassiliadis , Delft University of Technology
Ben Juurlink , Delft University of Technology
A large number of scientific applications involve the operation on, and manipulation of sparse matrices. Irregular structure of these matrices, however, causes hardware that otherwise behaves efficient on regular data to severely suffer in performance when handling sparse matrices. In order to tackle this problem, a scheme consisting of a novel Hierarchical Sparse Matrix (HiSM) storage format and an associated architectural concept have been presented. In this paper we propose, describe, and evaluate a hardware mechanism to facilitate transposition of a sparse matrix stored in the HiSM format. The proposed hardware is meant to be embedded in a vector processor as a functional unit. The main part of the unit consists of an s × s word in-processor memory, where s is the vector processor's section size. We determine suitable parameters for the proposed mechanism and study the performance of HiSM-based transposition using the matrices from the D-SAB benchmark suite. We show that the HiSM-based transposition executed on a vector processor equipped with the proposed unit exhibits speedups of up to 32.0 times with respect to the transposition based on the most widely used Compressed Row Storage format and executed on a standard vector processor. When considering average speedup, depending on the properties of matrices being transposed, such as the size and the organization of non-zero elements, a speedup by a factor between 15.5 and 20 has been observed.

B. Juurlink, P. Stathis, S. Vassiliadis and D. Cheresiz, "Sparse Matrix Transpose Unit," Parallel and Distributed Processing Symposium, International(IPDPS), Santa Fe, New Mexico, 2004, pp. 90b.
83 ms
(Ver 3.3 (11022016))