Parallel Algorithms / Architecture Synthesis, AIZU International Symposium on (1997)
Aizu-Wakamatsu, Fukushima, JAPAN
Mar. 17, 1997 to Mar. 21, 1997
Wesley K. Kaplow , Rensselaer Polytechnic Institute
Boleslaw K. Szymanski , Rensselaer Polytechnic Institute
Peter Tannenbaum , Rensselaer Polytechnic Institute
Viktor K. Decyk , Physics Department, UCLA
We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re-mapping techniques that are a part of distributed memory multi-processor codes designed to minimize interprocessor communication. Re-mapping each set of local data decreases cache-misses the same way re-mapping the global data decreases off-processor references. We demonstrate the applicability and performance of the RTRC technique on several prevalent applications: Sparse Matrix-Vector Multiply, Particle-In-Cell, and CHARMM-like codes. Performance results on SPARC-20, SP-2, and T3-D processors show that single node execution performance can be improved by as much as 35%.
B. K. Szymanski, V. K. Decyk, W. K. Kaplow and P. Tannenbaum, "Run-Time Reference Clustering for Cache Performance Optimization," Parallel Algorithms / Architecture Synthesis, AIZU International Symposium on(PAS), Aizu-Wakamatsu, Fukushima, JAPAN, 1997, pp. 42.