This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers
July 2001 (vol. 12 no. 7)
pp. 701-715

Abstract—Recursive Diagonal Torus (RDT), a class of interconnection network is proposed for massively parallel computers with up to $2^{16}$ nodes. By making the best use of a recursively structured diagonal mesh (torus) connection, the RDT has a smaller diameter (e.g., it is 11 for $2^{16}$ nodes) with a smaller number of links per node (i.e., 8 links per node) than those of the hypercube. A simple routing algorithm, called vector routing, which is near-optimal and easy to implement is also proposed. Although the congestion on upper rank tori sometimes degrades the performance under the random traffic, the RDT provides much better performance than that of a 2D/3D torus in most cases and, under hot spot traffic, the RDT provides much better performance than that of a 2D/3D/4D torus. The RDT router chip which provides a message multicast for maintaining cache consistency is available. Using the $0.5\mu m$ BiCMOS SOG technology, versatile functions, including hierarchical multicasting, combining acknowledge packets, shooting down/restart mechanism, and time-out/setup mechanisms, work at a 60MHz clock rate.

[1] Paragon XP/S Product Overview. Intel Corp., 1991.
[2] H. Ishihata, T. Horie, S. Inano, T. Shimizu, S. Kato, and M. Ikesaka, “Third Generation Message Passing Computer AP1000,” Proc. Int'l Symp. Supercomputing, pp. 46-55, Nov. 1991.
[3] S.L. Scott and G.M. Thorson, “The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus,” Proc. Hot Interconnect Symp. IV, pp. 147-156, 1996.
[4] W.J. Dally, A. Chien, S. Fiske, W. Horwat, J. Kenn, M. Larivee, R. Lethin, P. Nuth, and S. Wills, “The J-Machine: A Fine-Grain Concurrent Computer,” Proc. IFIP 11th Computer Congress, pp. 1147-1153, Aug. 1989.
[5] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[6] M.R. Samatham and D.K. Pradhan, "The de Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI," IEEE Trans. Computers, vol. 38, no. 4, pp. 567-581, Apr. 1989.
[7] C.E. Leiserson, "Fat-Trees: Universal Networks for Hardware Efficient Supercomputing," IEEE Trans. Computers, vol. C-34, no. 10, Oct. 1985, pp. 892-901.
[8] S.B. Akers, D. Harel, and B. Krishnamurthy, “The Star Graph: An Attractive Alternative to the n-Cube,” Proc. Int'l Conf. Parallel Processing '87, pp. 393-400, Aug. 1987.
[9] Y. Yang, H. Amano, H. Shibamura, and T. Sueyoshi, “Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers,” Proc. Fifth IEEESymp. Parallel and Distributed Processing, pp. 591-594, Dec. 1993.
[10] H. Tanaka, The Massively Parallel Processing System JUMP-1. Int'l Organizations Services Press, 1996.
[11] Y. Yang and H. Amano, “Message Transfer Algorithms on the Recursive Diagonal Torus,” IEICE Trans. Information&Systems, vol. E79-D, no. 2, pp. 107-116, 1996.
[12] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[13] J.C. Bermond and C. Peyrat, “De Bruijn and Kautz Networks: A Competitor for the Hypercube,” Hypercube and Distributed Computers, F. Andre and J.P. Verjus, eds., pp. 279-293, North-Holland, 1989.
[14] D.K. Paradhan, “Fault-Tolerant Multiprocessor Link and Bus Network Architectures,” IEEE Trans. Computers, vol. 34, no. 1, pp. 35-45, Jan. 1985.
[15] S. Sakai et al., "An Architecture of a Dataflow Single Chip Processor," Proc. 16th Ann. Int'l Symp. Computer Architecture, ACM Press, 1989, pp. 46-53.
[16] F.P. Preparata and J. Vuillemin, “The Cube-Connected Cycles: A Versatile Network for Parallel Computation,” Comm ACM, vol. 24, no. 5, pp. 300-309, 1981.
[17] K. Hwang and J. Ghosh, "Hypernet: A Communication Efficient Architecture for Constructing Massively Parallel Computers," IEEE Trans. Computers, pp. 1,450-1,466, 1987.
[18] K. Efe, “The Crossed Cube Architecture for Parallel Computing,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 513-524, Sept.-Oct. 1992.
[19] R. Beivide, E. Herrada, J.L. Balcázar, and A. Arruabarrena, "Optimal Distance Networks of Low Degree for Parallel Computers," IEEE Trans. Computers, vol. 40, no. 10, pp. 1,109-1,124, Oct. 1991.
[20] K.W. Tang and S.A. Padubidri, “Routing and Diameter Analysis of Diagonal Mesh Networks,” Proc. Int'l Conf. Parallel Processing '92, pp. 143-150, Aug. 1992.
[21] N. Tanabe, S. Nakamura, T. Suzuoka, and S. Oyanagi, “Base-m n-Cube: High Performance Interconnection Networks for Highly Parallel Computer PRODIGY,” Proc. Int'l Conf. Parallel Processing '91, pp. I509-516, Aug. 1991.
[22] T. Szymanski, “O(LogN/LogLogN) Randomized Routing in Degree-LogN 'Hypermeshes',” Proc. Int'l Conf. Parallel Processing '91, pp. 443-450, Aug. 1991.
[23] J. Duato, “A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, pp. 1,055–1,067, Oct. 1995.
[24] A. Funahashi, T. Hanawa, T. Kudoh, and H. Amano, “Adaptive Routing on the Recursive Diagonal Torus,” Proc. Int'l Symp. High Performance Computing, pp. 171-182, Nov. 1997.
[25] A. Funahashi, A. Jouraku, and H. Amano, “Adaptive Routing on the Recursive Diagonal Torus,” Proc. Int'l Symp. Parallel and Distributed Computing Systems, pp. 171-177, Aug. 1999.
[26] C.J. Glass and L.M. Ni, “Maximally Fully Adaptive Routing in 2D Meshes,” Proc. Int'l Soc. Computers and Their Applications '92, pp. 278-287, 1992.
[27] K. Hiraki, H. Amano, M. Kuga, T. Seuyoshi, T. Kudoh, H. Nakashima, H. Nakajo, H. Matsuda, T. Matsumoto, and S. Mori, “Overview of JUMP-1, An MPP Prototype for General Purpose Parallel Computations,” Proc. IEEE Int'l Symp. Parallel Architecture, Algorithm, and Networks, pp. 427-434, Dec. 1994.
[28] T. Kudoh, H. Amano, T. Matsumoto, K. Hiraki, Y. Yang, K. Nishimura, K. Yoshimura, and Y. Fukushima, “Hierarchical Bit-Map Directory Schemes on the RDT Interconnection Network for a Massively Parallel Processor JUMP-1,” Proc. Int'l Conf. Parallel Processing '95, vol. 1, pp. 186-193, Aug. 1995.
[29] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[30] K. Nishimura, T. Kudoh, H. Nishi, and H. Amano, “Pruning Cache: A Dynamic Directory Generation Scheme for Distributed Shared Memory,” Proc. ISMM Int'l Conf. Parallel and Distributed Computing and Networks, pp. 89-94, 1998.
[31] H. Inoue, K. Anjo, J. Yamamoto, J. Tanabe, M. Wakabayashi, M. Sato, H. Amano, and K. Hiraki, “The Preliminary Evaluation of MBP-Light with Two Protocol Policies for a Massively Parallel Processor JUMP-1,” Proc. IEEE Frontiers Massively Parallel Computation, pp. 268-275, Feb. 1999.
[32] Q. Fan, Y. Yang, A. Funahashi, and H. Amano, “A Torus Assignment for an Interconnection Network Recursive Diagonal Torus,” Proc. IEEE Int'l Symp. Parallel Architectures, Algorithms, and Networks, pp. 74-79, 1999.

Index Terms:
Interconnection network, massively parallel computer, routing algorithm, router chip, mesh network, torus network, message multicast.
Citation:
Yulu Yang, Akira Funahashi, Akiya Jouraku, Hiroaki Nishi, Hideharu Amano, Toshinori Sueyoshi, "Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 7, pp. 701-715, July 2001, doi:10.1109/71.940745
Usage of this product signifies your acceptance of the Terms of Use.