loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
12th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'04)
Improving Cache Locality with Blocked Array Layouts
A Coruna, Spain
February 11-February 13
ISBN: 0-7695-2083-9
Evangelia Athanasaki, National Technical University of Athens
Nectarios Koziris, National Technical University of Athens
Minimizing cache misses is one of the most important factors to reduce average latency for memory accesses. Tiled codes modify the instruction stream to exploit cache locality for array accesses. In this paper, we further reduce cache misses, restructuring the memory layout of multidimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multidimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results, using matrix multiplication and LU-decomposition on various size arrays, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Simulations using the Simplescalar tool, verify that enhanced performance is due to the considerable reduction of total cache misses.
Citation:
Evangelia Athanasaki, Nectarios Koziris, "Improving Cache Locality with Blocked Array Layouts," pdp, pp.308, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.