The Community for Technology Leaders
RSS Icon
Issue No.10 - Oct. (2013 vol.62)
pp: 2001-2012
Giorgos Dimitrakopoulos , Democritus University of Thrace, Xanthi
Emmanouil Kalligeros , University of the Aegean, Samos
Kostas Galanopoulos , National Technical University of Athens, Athens
Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting a huge amount of architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput and low-latency communication. The switches are the basic building blocks of such interconnection networks and their design critically affects the performance of the whole system. So far, innovation in switch design relied mostly to architecture-level solutions that took for granted the characteristics of the main building blocks of the switch, such as the buffers, the routing logic, the arbiters, the crossbar's multiplexers, and without any further modifications, tried to reorganize them in a more efficient way. Although such pure high-level design has produced highly efficient switches, the question of how much better the switch would be if better building blocks were available remains to be investigated. In this paper, we try to partially answer this question by explicitly targeting the design from scratch of new soft macros that can handle concurrently arbitration and multiplexing and can be parameterized with the number of inputs, the data width, and the priority selection policy. With the proposed macros, switch allocation, which employs either standard round robin or more sophisticated arbitration policies with significant network-throughput benefits, and switch traversal, can be performed simultaneously in the same cycle, while still offering energy-delay efficient implementations.
Switches, Multiplexing, Resource management, Vectors, Logic gates, System-on-a-chip, Routing, and logic design, Switch allocation, arbiters, crossbar, interconnection networks
Giorgos Dimitrakopoulos, Emmanouil Kalligeros, Kostas Galanopoulos, "Merged Switch Allocation and Traversal in Network-on-Chip Switches", IEEE Transactions on Computers, vol.62, no. 10, pp. 2001-2012, Oct. 2013, doi:10.1109/TC.2012.116
[1] W.J. Dally and B. Towles, "Route Packets, Not Wires: on-Chip Interconnection Networks," Proc. 38th Design Automation Conf. (DAC), June 2001.
[2] A. Golander, N. Levison, O. Heymann, A. Briskman, M.J. Wolski, and E.F. Robinson, "A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations," IEEE Trans. Circuits and Systems-I: Regural Papers, vol. 58, no. 3, pp. 529-538, Mar. 2011.
[3] P. Kumar, Y. Pan, J. Kim, G. Memik, and A. Choudhary, "Exploring Concetration and Channel Slicing in on-Chip Network Router," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), Feb. 2009.
[4] B. Grot, J. Hestness, S.W. Kekler, and O. Mutlu, "Express Cube Topologies for on-Chip Interconnects," Proc. 15th Int'l Symp. High-Performance Computer Architecture (HPCA), 2008.
[5] W.J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.
[6] G. Dimitrakopoulos and D. Bertozzi, "Switch Architecture," Designing Network on-Chip Architectures in the Nanoscale Era, Jose Flich and Davide Bertozzi, eds., CRC Press, 2010.
[7] M. Galles, "Spider: A High-Speed Network Interconnect," IEEE Micro, vol. 17, no. 1, pp. 34-39, Jan./Feb. 1997.
[8] A.S. Vaidya, A. Sivasubramaniam, and C.R. Das, "Lapses: A Recipe for High Performance Adaptive Router Design," Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA '99), pp. 236-243, 1999.
[9] W.J. Dally, "Virtual-Channel Flow Control," Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 60-68, May 1990.
[10] D.U. Becker and W.J. Dally, "Allocator Implementations for Network-on-Chip Routers," Proc. ACM/IEEE Int'l Supercomputing Conf., 2009.
[11] S.S. Mukherjee, F. Silla, P. Bannon, J.S. Emer, S. Lang, and D. Webb, "A Comparative Study of Arbitration Algorithms for the Alpha 21364 Pipelined Router," Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002.
[12] Y. Tamir and H.-C. Chi, "Symmetric Crossbar Arbiters for VLSI Communication Switches," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 1, pp. 13-27, Jan. 1993.
[13] J. Hurt, A. May, X. Zhu, and B. Lin, "Design and Implementation of High-Speed Symmetric Crossbar Schedulers," Proc. IEEE Int'l Conf. Comm. (ICC), pp. 253-258, June 1999.
[14] A. Kumar, P. Kundu, A. Singh, L.-S. Peh, and N.K. Jha, "A 4.6 Tbits/s 3.6 GHz Single-Cycle Noc Router with a Novel Switch Allocator in 65Nm CMOS," Proc. IEEE Int'l Conf. Computer Design (ICCD), 2007.
[15] M. Azimi, D. Dai, A. Mejia, D. Park, R. Saharoy, and A.S. Vaidya, "Flexible and Adaptive on-Chip Interconnect for Tera-Scale Architectures," Intel Technology J., vol. 13, no. 4, pp. 62-77, 2009.
[16] L.-S. Peh and W.J. Dally, "A Delay Model and Speculative Architecture for Pipelined Routers," Proc. Seventh Int'l Symp. High-Performance Computer Architecture (HPCA-7), 2001.
[17] R.D. Mullins, A.F. West, and S.W. Moore, "Low-Latency Virtual-Channel Routers for on-Chip Networks," Proc. Int'l Symp. Computer Architecture (ISCA), pp. 188-197, 2004.
[18] H. Matsutani, M. Koibuchi, H. Amano, and T. Yoshinaga, "Prediction Router: Yet Another Low Latency on-Chip Router Architecture," Proc. IEEE Symp. High-Performance Computer Architecture (HPCA), pp. 367-378, Feb. 2009.
[19] G. Dimitrakopoulos, "Logic-Level Implementation of Basic Switch Components," Designing Network on-Chip Architectures in the Nanoscale Era, Jose Flich and Davide Bertozzi, eds., CRC Press, 2010.
[20] P. Gupta and N. McKeown, "Design and Implementation of a Fast Crossbar Scheduler," IEEE Micro, vol. 19, no. 1, pp. 20-28, Jan./Feb. 1999.
[21] N. Chrysos and G. Dimitrakopoulos, "Practical High-Throughput Crossbar Scheduling," IEEE Micro, vol. 29, no. 4, pp. 22-35, July/Aug. 2009.
[22] M. Pirvu, L. Bhuyan, and N. Ni, "The Impact of Link Arbitration on Switch Performance," Proc. Fifth Int'l Symp. High-Performance Computer Architecture (HPCA), 1999.
[23] D. Abts and D. Weisser, "Age-Based Packet Arbitration In Large K-Ary N-Cubes," Proc. ACM/IEEE Conf. Supercomputing (SC), 2007.
[24] Sy nopsys, "Arbiter with Dynamic Priority Scheme," DesignWare Building Block IP,, June 2009.
[25] N. Weste and D. Harris, CMOS VLSI Design a Circuits and Systems Perspective, third ed. Addison Wesley, 2010.
[26] C. Savin, T. McSmyrthus, and J. Czilli, "Binary Tree Search Architecture for Efficient Implementation of Round Robin Arbiters," Proc. IEEE Int'l Conf. Acoustics, Speech, Signal Processing (ICASSP), 2004.
[27] G. Ascia, V. Catania, M. Palesi, and D. Patti, "Implementation and Analysis of a New Selection Strategy For Adaptive Routing in Networks-on-Chip," IEEE Trans. Computers, vol. 57, no. 6, pp. 809-820, June 2008.
[28] J. Flich and J. Duato, "LBDR: Logic-Based Distributed Routing for NoCs," IEEE Computer Architecture Letters, vol. 7, no. 1, pp. 13-16, Jan. 2008.
[29] P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote, S. Vangal, G. Ruhl, P. Kundu, and N. Borkar, "A 2Tb/s 6x4 Mesh Network with DVFS and 2.3Tb/s/W Router in 45nm CMOS," Proc. Symp. VLSI Circuits, 2010.
[30] G. Dimitrakopoulos, N. Chrysos, and C. Galanopoulos, "Fast Arbiters for on-Chip Network Switches," Proc. IEEE Int'l Conf. Computer Design (ICCD), pp. 664-670, 2008,
[31] K. Lee, S.-J. Lee, and H.-J. Yoo, "A Distributed On-Chip Crossbar Switch Scheduler for on-Chip Network," Proc. Custom Integrated Circuits Conf. (CICC), Sept. 2003.
[32] A.O. Balkan, G. Qu, and U. Vishkin, "Arbitrate and Move Primitives for High-Throughput on-Chip Interconnect," Proc. Int'l Symp. Circuits and Systems (ISCAS), 2004.
[33] J. Balfour and W.J. Dally, "Design Tradeoffs for Tiled CMP on-Chip Networks," Proc. 20th ACM Int'l Conf. Supercomputing (ICS), June 2006.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool