The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2011 vol.60)
pp: 783-799
Hiroki Matsutani , The University of Tokyo, Tokyo
Michihiro Koibuchi , National Institute of Informatics, Tokyo
Hideharu Amano , Keio University, Yokohama
Tsutomu Yoshinaga , The University of Electro-Communications, Tokyo
ABSTRACT
Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.
INDEX TERMS
Interconnection networks, on-chip networks, low-latency router architecture.
CITATION
Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, Tsutomu Yoshinaga, "Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors", IEEE Transactions on Computers, vol.60, no. 6, pp. 783-799, June 2011, doi:10.1109/TC.2011.17
REFERENCES
[1] ClearSpeed Technology, "CSX700 Processor Product Brief," http:/www.clearspeed.com/, 2008.
[2] D. Wentzlaff , P. Griffin , H. Hoffmann , L. Bao , B. Edwards , C. Ramey , M. Mattina , C.-C. Miao , J.F. Brown III , and A. Agarwal , "On-Chip Interconnection Architecture of the Tile Processor," IEEE Micro, vol. 27, no. 5, pp. 15-31, Sept. 2007.
[3] S.R. Vangal , J. Howard , G. Ruhl , S. Dighe , H. Wilson , J. Tschanz , D. Finan , A. Singh , T. Jacob , S. Jain , V. Erraguntla , C. Roberts , Y. Hoskote , N. Borkar , and S. Borkar , "An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 29-41, Jan. 2008.
[4] W.J. Dally and B. Towles , "Route Packets, Not Wires: On-Chip Interconnection Networks," Proc. Design Automation Conf. (DAC '01), pp. 684-689, June 2001.
[5] L. Benini and G.D. Micheli , "Networks on Chips: A New SoC Paradigm," IEEE Computer, vol. 35, no. 1, pp. 70-78, Jan. 2002.
[6] L. Benini and G.D. Micheli , Networks on Chips: Technology and Tools. Morgan Kaufmann, 2006.
[7] L.-S. Peh and W.J. Dally , "A Delay Model and Speculative Architecture for Pipelined Routers," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA '01), pp. 255-266, Jan. 2001.
[8] M. Galles , "Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SGI SPIDER Chip," Proc. IEEE Symp. High-Performance Interconnects (HOTI '96), pp. 141-146, Aug. 1996.
[9] C. Izu , R. Beivide , and C. Jesshope , "Mad-Postman: A Look-Ahead Message Propagation Method for Static Bidimensional Meshes," Proc. Euromicro Workshop Parallel and Distributed Processing (PDP '94), pp. 117-124, Jan. 1994.
[10] G. Michelogiannakis , D.N. Pnevmatikatos , and M. Katevenis , "Approaching Ideal NoC Latency with Pre-Configured Routes," Proc. Int'l Symp. Networks-on-Chip (NOCS '07), pp. 153-162, May 2007.
[11] A. Kumar , L.-S. Peh , P. Kundu , and N.K. Jha , "Express Virtual Channels: Towards the Ideal Interconnection Fabric," Proc. Ann. Int'l Symp. Computer Architecture (ISCA '07), pp. 150-161, June 2007.
[12] D. Park , R. Das , C. Nicopoulos , J. Kim , N. Vijaykrishnan , R. Iyer , and C. Das , "Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects," Proc. Ann. IEEE Symp. High-Performance Interconnects (HOTI '07), pp. 15-20, Aug. 2007.
[13] M. Koibuchi , H. Matsutani , H. Amano , and T.M. Pinkston , "A Lightweight Fault-Tolerant Mechanism for Network-on-Chip," Proc. ACM/IEEE Int'l Symp. Networks-on-Chip (NOCS '08), pp. 13-22, Apr. 2008.
[14] T. Krishna , A. Kumar , P. Chiang , M. Erez , and L.-S. Peh , "NoC with Near-Ideal Express Virtual Channels Using Global-Line Communication," Proc. IEEE Symp. High-Performance Interconnects (HOTI '08), pp. 11-20, Aug. 2008.
[15] W.J. Dally and B. Towles , Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.
[16] J. Kim , D. Park , T. Theocharides , N. Vijaykrishnan , and C.R. Das , "A Low Latency Router Supporting Adaptivity for On-Chip Interconnects," Proc. Design Automation Conf. (DAC '05), pp. 559-564, June 2005.
[17] J. Kim , C. Nicopoulos , D. Park , V. Narayanan , M.S. Yousif , and C.R. Das , "A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks," Proc. Ann. Int'l Symp. Computer Architecture (ISCA '06), pp. 14-15, June 2006.
[18] R. Mullins , A. West , and S. Moore , "Low-Latency Virtual-Channel Routers for On-Chip Networks," Proc. Ann. Int'l Symp. Computer Architecture (ISCA '04), pp. 188-197, June 2004.
[19] R. Mullins , A. West , and S. Moore , "The Design and Implementation of a Low-Latency On-Chip Network," Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC '06), pp. 164-169, Jan. 2006.
[20] A. Banerjee , R. Mullins , and S. Moore , "A Power and Energy Exploration of Network-on-Chip Architectures," Proc. Int'l Symp. Networks-on-Chip (NOCS '07), pp. 163-172, May 2007.
[21] J. Duato , P. López , F. Silla , and S. Yalamanchili , "A High Performance Router Architecture for Interconnection Networks," Proc. Int'l Conf. Parallel Processing (ICPP '96), pp. 61-68, Aug. 1996.
[22] N.E. Jerger , L.-S. Peh , and M. Lipasti , "Circuit-Switched Coherence," Proc. Int'l Symp. Networks-on-Chip (NOCS '08), pp. 193-202, Apr. 2008.
[23] L.-S. Peh and W.J. Dally , "Flit-Reservation Flow Control," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA '00), pp. 73-84, Jan. 2000.
[24] A. Kumar , P. Kundu , A.P. Singh , L.-S. Peh , and N.K. Jha , "A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS," Proc. Int'l Conf. Computer Design (ICCD '07), pp. 63-70, Oct. 2007.
[25] T. Yoshinaga , S. Kamakura , and M. Koibuchi , "Predictive Switching in 2D Torus Routers," Proc. Int'l Workshop Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA '06), pp. 65-72, 2006.
[26] T. Yoshinaga , H. Murakami , and M. Koibuchi , "Impact of Predictive Switching in 2-D Torus Networks," Proc. Int'l Workshop Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA '07), pp. 11-19, Dec. 2007.
[27] H. Matsutani , M. Koibuchi , H. Amano , and T. Yoshinaga , "Prediction Router: Yet Another Low Latency On-Chip Router Architecture," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA '09), pp. 367-378, Feb. 2009.
[28] Y. Sazeides and J.E. Smith , "The Predictability of Data Values," Proc. Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '97), pp. 248-258, Dec. 1997.
[29] P. Jacquet , W. Szpankowski , and I. Apostol , "A Universal Predictor Based on Pattern Matching," IEEE Trans. Information Theory, vol. 48, no. 6, pp. 1462-1472, June 2002.
[30] J. Kim , C. Nicopoulos , D. Park , R. Das , Y. Xie , N. Vijaykrishnan , M. Yousif , and C. Das , "A Novel Dimensionally-Decomposed Router for On-Chip Communication in 3D Architectures," Proc. Int'l Symp. Computer Architecture (ISCA '07), pp. 138-149, 2007.
[31] D. Park , S. Eachempati , R. Das , A.K. Mishra , V. Narayanan , Y. Xie , and C.R. Das , "MIRA: A Multi-Layered On-Chip Interconnect Router Architecture," Proc. Int'l Symp. Computer Architecture (ISCA '08), pp. 251-261, 2008.
[32] M. Coppola , R. Locatelli , G. Maruccia , L. Pieralisi , and A. Scandurra , "Spidergon: A Novel On-Chip Communication Network," Proc. Int'l Symp. System-on-Chip (ISSOC '04), p. 15, Nov. 2004.
[33] L. Bononi and N. Concer , "Simulation and Analysis of Network on Chip Architectures:. Ring, Spidergon and 2D Mesh," Proc. Design, Automation, and Test in Europe Conf. (DATE '06), pp. 154-159, Mar. 2006.
[34] M. Moadeli , A. Shahrabi , W. Vanderbauwhede , and M. Ould-Khaoua , "Communication Modelling of the Spidergon NoC with Virtual Channels," Proc. Int'l Conf. Parallel Processing (ICPP '07), Sept. 2007.
[35] H. Wang , L.-S. Peh , and S. Malik , "A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks," Proc. Design, Automation and Test in Europe Conf. (DATE '05), pp. 1238-1243, Mar. 2005.
[36] H. Matsutani , M. Koibuchi , D. Wang , and H. Amano , "Run-Time Power Gating of On-Chip Routers Using Look-Ahead Routing," Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC '08), pp. 55-60, Jan. 2008.
[37] D. Bailey , T. Harris , W. Saphir , R. van der Wijngaart , A. Woo , and M. Yarrow , "The NAS Parallel Benchmarks 2.0," NAS Technical Report NAS-95-020, Dec. 1995.
82 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool