The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 319-329
Yuan He , Univ. of Tokyo, Tokyo, Japan
Hiroshi Sasaki , Kyushu Univ., Fukuoka, Japan
Shinobu Miwa , Univ. of Tokyo, Tokyo, Japan
Hiroshi Nakamura , Univ. of Tokyo, Tokyo, Japan
ABSTRACT
The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip inter-connection networks (or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors (CMPs), these networks face stringent design constraints to provide fast communication with tight power budget. Modern NoC's first-order concern is clearly its latency, while we also find that internal bandwidth of its routers is relatively plentiful; thus, we present a low latency router design utilizing a technique we call “multicast within a router” or McRouter, which allows productive utilization of remaining bandwidth inside a NoC router. McRouter allows a single cycle transfer of flits which shortens the communication latency when there is enough remaining bandwidth within the router. The key idea is to transmit a header flit to all possible output ports (multicast) so that it is always transmitted to the correct output port without relying on route computation. In addition, we find it is affordable with marginal power overhead while still being a stand-alone design by maintaining portability and modularity (unlike look-ahead routing based designs). Our evaluation with application traffic shows that McRouter helps achieving system speed-ups of 1.28, 1.17 and 1.05 over the conventional router (CR), the VSA router (VSAR) and the prediction router (PR), respectively.
INDEX TERMS
Pipelines, Registers, Delays, Computer architecture, Switches,throttling, memory systems, multi-core, prefetching
CITATION
Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura, "TCPT: thread criticality-driven prefetcher throttling", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 319-329, 2013, doi:10.1109/PACT.2013.6618828
289 ms
(Ver 3.3 (11022016))