Sparse matrix-vector multiplication (shortly SpM?V ) is one of most common subroutines in the numerical linear algebra. The problem is that the memory access patterns during the SpM?V are irregular and the utilization of cache can suffer from low spatial or temporal locality.
This paper introduces new approach for the acceleration the SpM?V . This approach consists of 3 steps. The first step divides the whole matrix into smaller parts (regions) those can fit in the cache. The second step improves locality during the multiplication due to better utilization of distant references. The last step maximizes machine computation performance of the partial multiplication for each region.
In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Our measurements proved that our approach gives a significant speedup for almost all matrices arising from various technical areas.