2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Aug. 23, 2014 to Aug. 27, 2014
Xu Liu , Department of Computer Science, Rice University, Houston, TX, USA
Kamal Sharma , Department of Computer Science, Rice University, Houston, TX, USA
John Mellor-Crummey , Department of Computer Science, Rice University, Houston, TX, USA
Memory hierarchies in modern computer systems are complex; often, they include multi-level caches and multiple memory controllers on the same chip. Without careful design, programs suffer from unnecessary data movement between caches and memory, degrading performance and increasing energy consumption. Array regrouping can significantly improve data locality by improving spatial reuse of data and reducing cache contention. However, existing techniques for identifying opportunities for array regrouping are lacking in three ways. First, they provide inadequate information to guide regrouping. Second, the cost of monitoring employed by prior tools to identify regrouping opportunities limits the use of these methods in practice. Third, existing metrics for quantifying the benefits of array regrouping can lead to inappropriate transformations that hurt performance. In this paper, we describe ArrayTool — a lightweight profiler that guides array regrouping. Array-Tool has three unique capabilities. First, it focuses attention on arrays with significant access latency. Second, it identifies the feasibility and quantifies the benefits of regrouping arrays with lightweight array-centric profiling. Third, it works on both shared-memory and distributed-memory parallel programs. To illustrate the utility of ArrayTool, we employ it to analyze three benchmarks. Using the guidance it provides, we regroup program arrays, improving performance from 25% to a factor of two.
Arrays, Prefetching, Hardware, Monitoring, Measurement
Xu Liu, Kamal Sharma, John Mellor-Crummey, "ArrayTool: A lightweight profiler to guide array regrouping", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 405-415, 2014, doi:10.1145/2628071.2628102