This Article 
 Bibliographic References 
 Add to: 
Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network
August 2009 (vol. 20 no. 8)
pp. 1126-1141
Hiroki Matsutani, Keio University, Yokohama
Michihiro Koibuchi, National Institute of Informatics, Tokyo
Yutaka Yamada, Keio University, Yokohama and Toshiba Corporation, Kawasaki
D. Frank Hsu, Fordham University, New York
Hideharu Amano, Keio University, Yokohama
The topological explorations of on-chip networks are important for efficiently using their enormous wire resources for low-latency and high-throughput communications using a modest silicon budget. In this paper, we propose a novel tree-based interconnection network called Fat H-Tree that meets these requirements. A Fat H-Tree provides a torus structure by combining two folded H-Tree networks and is an attractive alternative to tree-based networks such as the Fat Trees in a microarchitecture domain. We introduce its chip layout schemes based on a folding technique for 2D and 3D ICs. Three deadlock-free routing schemes are proposed for Fat H-Tree. We evaluate the performance of Fat H-Tree and other tree-based networks using real application traces. In addition, the network logic area, wire resource, and energy consumption of Fat H-Tree are compared with other topologies, based on a typical implementation of on-chip routers synthesized with a 90-nm standard cell library. The results show that 1) a Fat H-Tree outperforms a Fat Tree with two upward and four downward connections in terms of the throughput and average hop count, 2) a Fat H-Tree requires 19.8 percent-27.8 percent smaller network logic area than the Fat Tree, 3) a Fat H-Tree consumes slightly less energy than the Fat Tree does, and 4) a Fat H-Tree uses slightly more wire resources than the Fat Tree, but the current process technology can provide sufficient wire resources for implementing Fat-H-Tree-based on-chip networks.

[1] W.J. Dally and B. Towles, “Route Packets, Not Wires: On-Chip Interconnection Networks,” Proc. 37th Design Automation Conf. (DAC '01), pp. 684-689, June 2001.
[2] L. Benini and G.D. Micheli, “Networks on Chips: A New SoC Paradigm,” Computer, vol. 35, no. 1, pp. 70-78, Jan. 2002.
[3] L. Benini and G.D. Micheli, Networks on Chips: Technology and Tools. Morgan Kaufmann, 2006.
[4] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J.F. Brown III, and A. Agarwal, “On-Chip Interconnection Architecture of the Tile Processor,” IEEE Micro, vol. 27, no. 5, pp. 15-31, Sept. 2007.
[5] P. Gratz, C. Kim, K. Sankaralingam, H. Hanson, P. Shivakumar, S.W. Keckler, and D. Burger, “On-Chip Interconnection Networks of the TRIPS Chip,” IEEE Micro, vol. 27, no. 5, pp. 41-50, Sept. 2007.
[6] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar, “A 5-GHz Mesh Interconnect for a Teraflops Processor,” IEEE Micro, vol. 27, no. 5, pp. 51-61, Sept. 2007.
[7] C.E. Leiserson, “Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing,” IEEE Trans. Computers, vol. 34, no. 10, pp. 892-901, Oct. 1985.
[8] A. DeHon, “Compact, Multilayer Layout for Butterfly Fat-Tree,” Proc. 12th Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA '00), pp. 206-215, July 2000.
[9] A. DeHon, “Unifying Mesh- and Tree-Based Programmable Interconnect,” IEEE Trans. Very Large Scale Integration Systems, vol. 12, no. 10, pp. 1051-1065, Oct. 2004.
[10] A. Andriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C.A. Zeferino, “SPIN: A Scalable, Packet Switched, On-Chip Micro-Network,” Proc. Design Automation and Test in Europe Conf. (DATE '03), pp. 70-73, Mar. 2003.
[11] C. Grecu, P.P. Pande, A. Ivanov, and R. Saleh, “Structured Interconnect Architecture: A Solution for the Non-Scalability of Bus-Based SoCs,” Proc. 14th ACM Great Lakes Symp. VLSI (GLSVLSI '04), pp. 192-195, Apr. 2004.
[12] N. Kapre, N. Mehta, M. deLorimier, R. Rubin, H. Barnor, M.J. Wilson, M. Wrighton, and A. DeHon, “Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks,” Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM '06), pp. 205-216, Apr. 2006.
[13] S. Das, A. Fan, K.-N. Chen, C.S. Tan, N. Checka, and R. Reif, “Technology, Performance, and Computer-Aided Design of Three-Dimensional Integrated Circuits,” Proc. Int'l Symp. Physical Design (ISPD '04), pp. 108-115, Apr. 2004.
[14] W.R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A.M. Sule, M. Steer, and P.D. Franzon, “Demystifying 3D ICs: The Pros and Cons of Going Vertical,” IEEE Design and Test of Computers, vol. 22, no. 6, pp. 498-510, Nov. 2005.
[15] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G.H. Loh, D. McCaule, P. Morrow, D.W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J.P. Shen, and C. Webb, “Die Stacking (3D) Microarchitecture,” Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO '06), pp. 469-479, Dec. 2006.
[16] C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, W.D. Hillis, B.C. Kuszmaul, M.A.S. Pierre, D.S. Wells, M.C. Wong-Chan, S.-W. Yang, and R. Zak, “The Network Architecture of the Connection Machine CM-5,” J. Parallel and Distributed Computing, vol. 33, no. 2, pp. 145-158, Mar. 1996.
[17] Y. Yang, A. Funahashi, A. Jouraku, H. Nishi, H. Amano, and T. Sueyoshi, “Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 7, pp. 701-715, July 2001.
[18] F.T. Leighton, “New Lower Bound Techniques for VLSI,” Math. Systems Theory, vol. 17, no. 1, pp. 47-70, Apr. 1984.
[19] W.J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.
[20] J. Kim, W.J. Dally, B. Towles, and A.K. Gupta, “Microarchitecture of a High-radix Router,” Proc. 32nd Int'l Symp. Computer Architecture (ISCA '05), pp. 420-431, June 2005.
[21] J. Kim, J. Balfour, and W.J. Dally, “Flattened Butterfly Topology for On-Chip Networks,” Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO '07), pp. 172-182, Dec. 2007.
[22] M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, and A. Scandurra, “Spidergon: A Novel On-Chip Communication Network,” Proc. Int'l Symp. System-on-Chip (ISSOC '04), p. 15, Nov. 2004.
[23] R. Sabbaghi-Nadooshan, M. Modarressi, and H. Sarbazi-Azad, “A Novel High-Performance and Low-Power Mesh-Based NoC,” Proc. Int'l Workshop Performance Modeling, Evaluation, and Optimization of Ubiquitous Computing and Networked Systems (PMEO-UCNS '08), Apr. 2008.
[24] A. Sharifi, R. Sabbaghi-Nadooshan, and H. Sarbazi-Azad, “The Shuffle-Exchange Mesh Topology for 3D NoCs,” Proc. Ninth Int'l Symp. Parallel Architectures, Algorithms, and Networks (I-SPAN '08), pp. 275-280, May 2008.
[25] G.D. Vecchia and C. Sanges, “A Recursively Scalable Network VLSI Implementation,” Future Generation Computer Systems, vol. 4, no. 3, pp. 235-243, Oct. 1988.
[26] D. Rahmati, A.E. Kiasari, S. Hessabi, and H. Sarbazi-Azad, “A Performance and Power Analysis of WK-Recursive and Mesh Networks for Network-on-Chips,” Proc. 24th Int'l Conf. Computer Design (ICCD '06), pp. 142-147, Oct. 2006.
[27] Y. Yamada, H. Amano, M. Koibuchi, A. Jouraku, K. Anjo, and K. Nishimura, “Folded Fat H-Tree: An Interconnection Topology for Dynamically Reconfigurable Processor Array,” Proc. Int'l Conf.Embedded and Ubiquitous Computing (EUC '04), pp. 301-311, Aug. 2004.
[28] H. Matsutani, M. Koibuchi, and H. Amano, “Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network,” Proc. 21st Int'l Parallel and Distributed Processing Symp. (IPDPS '07), Mar. 2007.
[29] H. Matsutani, M. Koibuchi, D.F. Hsu, and H. Amano, “Three-Dimensional Layout of On-Chip Tree-Based Networks,” Proc. Ninth Int'l Symp. Parallel Architectures, Algorithms, and Networks (I-SPAN '08), pp. 281-288, May 2008.
[30] T.M. Pinkston and J. Shin, “Trends toward On-Chip Networked Microsystems,” Int'l J. High Performance Computing and Networking, vol. 3, no. 1, pp. 3-18, Sept. 2005.
[31] F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir, “Design and Management of 3D Chip Multiprocessors Using Network-in-Memory,” Proc. 33rd Int'l Symp. Computer Architecture (ISCA '06), pp. 130-141, June 2006.
[32] W.J. Dally and J.W. Poulton, Digital Systems Eng. Cambridge Univ. Press, 1998.
[33] A. Banerjee, R. Mullins, and S. Moore, “A Power and Energy Exploration of Network-on-Chip Architectures,” Proc. First Int'l Symp. Networks-on-Chip (NOCS '07), pp. 163-172, May 2007.
[34] H. Matsutani, M. Koibuchi, D. Wang, and H. Amano, “Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks,” Proc. Second Int'l Symp. Networks-on-Chip (NOCS '08), pp. 23-32, Apr. 2008.
[35] H. Wang, L.-S. Peh, and S. Malik, “A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks,” Proc. Design, Automation and Test in Europe Conf. (DATE '05), pp.1238-1243, Mar. 2005.
[36] R. Ho, K.W. Mai, and M.A. Horowitz, “The Future of Wires,” Proc. IEEE, vol. 89, no. 4, pp. 490-504, Apr. 2001.
[37] J.C. Sancho and A. Robles, “Improving the Up*/Down* Routing Scheme for Networks of Workstations,” Proc. Int'l Euro-Par Conf. Parallel Processing (Euro-Par '00), pp. 882-889, Aug. 2000.
[38] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” NAS Technical Report NAS-95-020, Dec. 1995.

Index Terms:
Interconnection networks, on-chip networks, network topology, tree, routing algorithm.
Hiroki Matsutani, Michihiro Koibuchi, Yutaka Yamada, D. Frank Hsu, Hideharu Amano, "Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 8, pp. 1126-1141, Aug. 2009, doi:10.1109/TPDS.2008.233
Usage of this product signifies your acceptance of the Terms of Use.