Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics
2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/
Ashay Rane , Texas Advanced Computing Center, The University of Texas at Austin, USA
James Browne , Department of Computer Science, The University of Texas at Austin, USA
Program performance optimization is usually based solely on measurements of execution behavior of code segments using hardware performance counters. However, memory access patterns are critical performance limiting factors for today's multicore chips where performance is highly memory bound. Therefore diagnoses and selection of optimizations based only on measurements of the execution behavior of code segments are incomplete because they do not incorporate knowledge of memory access patterns and behaviors. This paper presents a low-overhead tool (MACPO) that captures memory traces and computes metrics for the memory access behavior of source-level (C, C++, Fortran) data structures. It also presents a complete process for integrating code segment-based and memory access pattern measurements and analyses for performance optimization specifically targeting multicore chips and multichip nodes of clusters. MACPO explicitly targets the measurement and metrics important to performance optimization for multicore chips. MACPO uses more realistic cache models for computation of latency metrics than those used by previous tools. Evaluation of the effectiveness of adding memory access behavior characteristics of data structures to performance optimization was done on subsets of the ASCI, NAS and Rodina parallel benchmarks and one application program from a domain not represented in these benchmarks. Adding memory behavior characteristics enabled easier diagnoses of bottlenecks and more accurate selection of appropriate optimizations than with only code centric behavior measurements. The performance gains ranged from a few percent to 38 percent.
Optimization, Data structures, Semiconductor device measurement, Multicore processing, Instruments, Radiation detectors
A. Rane and J. Browne, "Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics," 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, USA, 2012, pp. 147-156.