Networks-on-Chip, International Symposium on (2007)
Princeton, New Jersey
May 7, 2007 to May 9, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/NOCS.2007.42
Evgeny Bolotin , Israel Institute of Technology, Israel
Zvika Guz , Israel Institute of Technology, Israel
Israel Cidon , Israel Institute of Technology, Israel
Ran Ginosar , Israel Institute of Technology, Israel
Avinoam Kolodny , Israel Institute of Technology, Israel
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting efficient cache access and cache coherency in future high-performance Chip Multi Processors (CMPs). We address previously proposed CMP architectures based on Non Uniform Cache Architecture (NUCA) over NoC, analyze basic memory transactions and translate them into a set of network transactions. We first show how a simple, generic NoC which is equipped with needed module interface functionalities can provide infrastructure for the coherent access of both static and dynamic NUCA. Then we show how several low cost mechanisms incorporated into such a Vanilla NoC can facilitate CMP and boost performance of a cache coherent NUCA CMP. The basic mechanism is based on priority support embedded in the NoC, which differentiates between short control signals and long data messages to achieve a major reduction in cache access delay. The low cost Priority-based NoC is extremely useful for increasing performance of almost any other CMP transaction (i.e. uncached and cache-coherent R/W, search in DNUCA, isolating low priority traffic, synchronization and mutual exclusion support). Priority-based NoC along with the discussed NoC interfaces are evaluated in detail using CMP-NoC simulations across several SPLASH-2 benchmarks and static web content serving benchmarks showing substantial L2 cache access delay reduction and overall program speedup. For further system improvements, we introduce additional low cost NoC mechanisms that include: virtual invalidation rings, efficient store-and-forward multicast for short messages which is embedded within a wormhole NoC, and a cache-line search mechanism for the efficient operation of dynamic NUCA. These mechanisms can also expedite not only cache coherency but also other basic CMP transactions such as search and serialization primitives support.
E. Bolotin, A. Kolodny, Z. Guz, I. Cidon and R. Ginosar, "The Power of Priority: NoC Based Distributed Cache Coherency," 2007 International Symposium on Networks-on-Chip(NOCS), Princeton, NJ, 2007, pp. 117-126.