This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
June 2007 (vol. 18 no. 6)
pp. 818-834

Abstract—Cache coherence in shared-memory multiprocessor systems has been studied mostly from an architecture viewpoint, often by means of aggregating metrics. In many cases, aggregate events provide insufficient information for programmers to understand and optimize the coherence behavior of their applications. A better understanding would be given by source code correlations of not only aggregate events, but also finer granularity metrics directly linked to high-level source code constructs, such as source lines and data structures. In this paper, we explore a novel application-centric approach to studying coherence traffic. We develop a coherence analysis framework based on incremental coherence simulation of actual reference traces. We provide tool support to extract these reference traces and synchronization information from OpenMP threads at runtime using dynamic binary rewriting of the application executable. These traces are fed to ccSIM, our cache-coherence simulator. The novelty of ccSIM lies in its ability to relate low-level cache coherence metrics (such as coherence misses and their causative invalidations) to high-level source code constructs including source code locations and data structures. We explore the degree of freedom in interleaving data traces from different processors and assess simulation accuracy in comparison to metrics obtained from hardware performance counters. Our quantitative results show that: 1) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches the corresponding hardware performance counters. 2) Detailed, high-level coherence statistics are very useful in detecting, isolating, and understanding coherence bottlenecks. We use ccSIM with several well-known benchmarks and find coherence optimization opportunities leading to significant reductions in coherence traffic and savings in wall-clock execution time.

Index Terms:
Cache memories, simulation, dynamic binary rewriting, program instrumentation, SMPs, coherence protocols.
Citation:
Jaydeep Marathe, Frank Mueller, "Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks," IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 6, pp. 818-834, June 2007, doi:10.1109/TPDS.2007.1058
Usage of this product signifies your acceptance of the Terms of Use.