The Community for Technology Leaders
Parallel Algorithms / Architecture Synthesis, AIZU International Symposium on (1997)
Aizu-Wakamatsu, Fukushima, JAPAN
Mar. 17, 1997 to Mar. 21, 1997
ISBN: 0-8186-7870-4
pp: 42
Wesley K. Kaplow , Rensselaer Polytechnic Institute
Boleslaw K. Szymanski , Rensselaer Polytechnic Institute
Peter Tannenbaum , Rensselaer Polytechnic Institute
Viktor K. Decyk , Physics Department, UCLA
ABSTRACT
We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re-mapping techniques that are a part of distributed memory multi-processor codes designed to minimize interprocessor communication. Re-mapping each set of local data decreases cache-misses the same way re-mapping the global data decreases off-processor references. We demonstrate the applicability and performance of the RTRC technique on several prevalent applications: Sparse Matrix-Vector Multiply, Particle-In-Cell, and CHARMM-like codes. Performance results on SPARC-20, SP-2, and T3-D processors show that single node execution performance can be improved by as much as 35%.
INDEX TERMS
CITATION
Wesley K. Kaplow, Boleslaw K. Szymanski, Peter Tannenbaum, Viktor K. Decyk, "Run-Time Reference Clustering for Cache Performance Optimization", Parallel Algorithms / Architecture Synthesis, AIZU International Symposium on, vol. 00, no. , pp. 42, 1997, doi:10.1109/AISPAS.1997.581623
84 ms
(Ver 3.3 (11022016))