This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power
September/October 2010 (vol. 30 no. 5)
pp. 54-65
Stefanos Kaxiras, Uppsala University, Sweden
Georgios Keramidas, Industrial Systems Institute, Greece

The SARC project seeks to improve power scalability of shared-memory chip multiprocessors (CMPs) by making directory coherence more efficient in both power and performance. The authors describe how they eliminate two major sources of inefficiency for directory coherence protocols: invalidation traffic on writes and directory indirection for finding the writer.

1. M. Hill, "What is Scalability?" Computer Architecture News, vol. 18, no. 4, 1990, pp. 18-21.
2. J.P. Singh and D. Culler, Parallel Computer Architecture: A Hardware/Software Approach, Morgan-Kaufmann Publishers, 1998.
3. A. Ramirez et al., "The SARC Architecture," IEEE Micro, vol. 30, no. 5, 2010, pp. 16-29.
4. D. Lenoski et al., "The Stanford DASH Multiprocessor," Computer, vol. 25, no. 3, 1992, pp. 63-79.
5. A.R. Lebeck and D. Wood, "Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors," Proc. Int'l Symp. Computer Architecture (ISCA-22), IEEE Press, 1995, pp. 48-59.
6. K. Gharachorloo et al., "Programming for Different Memory Consistency Models," J. Parallel and Distributed Computing, vol. 15, no. 4, 1992, pp. 399-407.
7. M.E. Acacio et al., "Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture," Proc. Int'l Conf. Supercomputing (ICS-02), IEEE Press, 2002, pp. 1-12.
8. S. Kaxiras, G. Keramidas, and I. Oikonomou, "Power-Efficient Scaling of CMP Directory Coherence," Proc. Workshop Programmability Issues for Multi-Core Computers, 2009; available at http://multiprog.ac.upc.edu/ multiprog09/ resourcesproceedings2009.pdf
9. A.L. Cox and R.J. Fowler, "Adaptive Cache Coherency for Detecting Migratory Shared Data," Proc. Int'l Symp. Computer Architecture (ISCA-20), IEEE Press, 1993, pp. 98-108.
10. P. Stenström, M. Brorsson, and L. Sandberg, "An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing," Proc. Int'l Symp. Computer Architecture (ISCA-20), IEEE Press, 1993, pp. 109-118.
11. S. Kaxiras and J.R. Goodman, "Improving CC-NUMA Performance Using Instruction-based Prediction," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA-5), IEEE Press, 1999, pp. 161-172.
12. A. Agarwal, M. Horowitz, and J. Hennessy, "An Evaluation of Directory Schemes for Cache Coherence," Proc. Int'l Symp. Computer Architecture (ISCA-15), IEEE Press, 1988, pp. 280-298.
13. S. Kaxiras, Z. Hu, and M. Martonosi, "Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power," Proc. Int'l Symp. Computer Architecture (ISCA-28), IEEE Press, 2001, pp. 240-251.
14. S. Kaxiras and C. Young, "Coherence Communication Prediction in Shared-Memory Multiprocessors," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA-6), IEEE Press, 2000, pp. 156-167.
15. S. Kaxiras and G. Keramidas, "Power-Scalable Coherence, HiPEAC tech. report," http://www.hipeac.net/system/filesPower_Scalable_Coherence_TR.pdf , 2010.
1. A.R. Lebeck and D. Wood, "Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors," Proc. Int'l Symp. Computer Architecture (ISCA-22), IEEE Press, 1995, pp. 48-59.
2. A.C. Lai and B. Falsafi, "Selective, Accurate, and Timely Self-Invalidation Using Last-Touch Prediction," Proc. Int'l Symp. Computer Architecture (ISCA-27), ACM Press, 2000, pp. 139-148.
3. S. Mukherjee and M.D. Hill, "Using Prediction to Accelerate Coherence Protocols," Proc. Int'l Symp. Computer Architecture (ISCA-25), IEEE Press, 1998, pp. 179-190.
4. S. Kaxiras and J.R. Goodman, "Improving CC-NUMA Performance Using Instruction-based Prediction," Proc. Int'l Symp. on High-Performance Computer Architecture (HPCA-5), IEEE Press, 1999, pp. 161-172.
5. S. Kaxiras and C. Young, "Coherence Communication Prediction in Shared-Memory Multiprocessors," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA-6), IEEE Press, 2000, pp. 156-167.
6. M.E. Acacio et al., "Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture," Proc. Int'l Conf. Supercomputing (ICS-02), IEEE Press, 2002, pp 1-12.
7. W.D. Weber and A. Gupta, "Analysis of Cache Invalidation Patterns in Multiprocessors," Proc. Int'l Conf. Architecture Support for Programming Languages and Operating Systems (ASPLOS-III), ACM Press, 1998, pp. 243-256.
8. P. Stenström, M. Brorsson, and L. Sandberg, "An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing," Proc. Int'l Symp. Computer Architecture (ISCA-20), IEEE Press, 1993, pp. 109-118.
9. M. Martin, M.D. Hill, and D. Wood, "Token Coherence: Decoupling Performance and Correctness," Proc. Int'l Symp. Computer Architecture (ISCA-30), IEEE Press, 2003, pp. 182-193.
10. N. Eisley, L.S. Peh, and L. Shang, "In-Network Cache Coherence," Proc. Int'l Symp. Microarchitecture (Micro-39), IEEE Press, 2006, pp. 321-332.
11. H. Nilsson and P. Stenstrom, "The Scalable Tree Protocol: A Cache Coherence Approach for Large-Scale Multiprocessors," Proc. Int'l Symp. Parallel and Distributed Processing (IPDPS-4), IEEE Press, 1992, pp. 498-506.
12. H. Hossain, S. Dwarkadas, and M.C. Huang, "Improving Support for Locality and Fine-Grain Sharing in Chip Multiprocessors," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT-17), ACM Press, 2008, pp. 155-165.

Index Terms:
chip multiprocessors, directory cache coherence, power and performance scalability, SARC architecture
Citation:
Stefanos Kaxiras, Georgios Keramidas, "SARC Coherence: Scaling Directory Cache Coherence in Performance and Power," IEEE Micro, vol. 30, no. 5, pp. 54-65, Sept.-Oct. 2010, doi:10.1109/MM.2010.82
Usage of this product signifies your acceptance of the Terms of Use.