2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2010)
Sept. 11, 2010 to Sept. 15, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/
Derek L. Schuff , Purdue University, West Lafayette, IN 47907, USA
Milind Kulkarni , Purdue University, West Lafayette, IN 47907, USA
Vijay S. Pai , Purdue University, West Lafayette, IN 47907, USA
Reuse distance analysis is a well-established tool for predicting cache performance, driving compiler optimizations, and assisting visualization and manual optimization of programs. Existing reuse distance analysis methods either do not account for the effects of multithreading, or suffer severe performance penalties. This paper presents a sampled, parallelized method of measuring reuse distance proiles for multithreaded programs, modeling private and shared cache configurations. The sampling technique allows it to spend much of its execution in a fast low-overhead mode, and allows the use of a new measurement method since sampled analysis does not need to consider the full state of the reuse stack. This measurement method uses O(1) data structures that may be made thread-private, allowing parallelization to reduce overhead in analysis mode. The performance of the resulting system is analyzed for a diverse set of parallel benchmarks and shown to generate accurate output compared to non-sampled full analysis as well as good results for the common application of locating low-locality code in the benchmarks, all with a performance overhead comparable to the best single-threaded analysis techniques.
D. L. Schuff, M. Kulkarni and V. S. Pai, "Accelerating multicore reuse distance analysis with sampling and parallelization," 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), Vienna, Austria, 2010, pp. 53-63.