Issue No. 05 - May (2013 vol. 62)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2012.52
Christian Fensch , University of Edinburgh, Edinburgh
Nick Barrow-Williams , Nvidia Corporation, Santa Clara
Robert D. Mullins , University of Cambridge, Cambridge
Simon Moore , University of Cambridge, Cambridge
Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available. However, energy and latency costs of communication increasingly limit the parallel programs running on these platforms. Existing designs provide a functional communication layer, but not necessarily the most efficient solution. Due to power limitations, efficiency is now a primary concern that motivates us to look again at cache coherence. First, we analyze the communication behavior of parallel applications. The observed sharing patterns reveal considerable locality of shared data accesses between threads with consecutive IDs. This pattern corresponds to strong physical locality between adjacent cores in a chip-multiprocessor (CMP). This paper explores the design of Proximity Coherence: a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links. We exploit these patterns and improve the efficiency of communication. The results show that careful analysis leads to the design of a more efficient coherence protocol. The protocol reduces the latency of load misses by up to 33 percent (17 percent, on average), improving overall execution time by up to 13 percent. Furthermore, it also reduces network-on-chip traffic by 19 percent and energy consumption by up to 30 percent.
Central Processing Unit, Coherence, Protocols, Transistors, Computers, Energy consumption, Educational institutions, network-on-chip, Central Processing Unit, Coherence, Protocols, Transistors, Computers, Energy consumption, Educational institutions, physical locality, Proximity coherence, CMP, cache design
C. Fensch, N. Barrow-Williams, R. D. Mullins and S. Moore, "Designing a Physical Locality Aware Coherence Protocol for Chip-Multiprocessors," in IEEE Transactions on Computers, vol. 62, no. , pp. 914-928, 2013.