2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
July 13, 2014 to July 15, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2014.43
Most modern processors have some cache memoriesthat are much faster than a main memory, and it isimportant to utilize them effectively for efficient programexecution. The cache memories function well if temporal orspatial localities in the program are enhanced. Therefore, the cache efficiency can be improved by making accesses tothe same array continuous. In addition, a multidimensionalarray can be regarded as an array of lower dimensionalarrays, which means that it is more effective to continuouslyaggregate the array references with same indexes more in thehighest dimensions, even if they are not completely same. Wepropose a new cache optimization technique for improvingcache efficiency based on global code motion. Our techniquemoves a load instruction to immediately after the precedingload instructions accessing the same array with the most similarindexes, and then delays it as late as possible without changingthe access order. These two-step code motions contribute tonot only the improvement of the cache efficiency in the entireprogram but also the suppression of register pressure. Wehave implemented our technique in a real compiler and haveevaluated it for SPEC benchmarks. The experimental resultsshow that our technique can decrease cache misses by about99.9% in the best case.
Arrays, Indexes, Cache memory, Registers, Aggregates, Equations, Program processors
Y. Sumikawa and M. Takimoto, "Global Load Instruction Aggregation Based on Array Dimensions," 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Beijing, China, 2014, pp. 123-129.