This Article 
 Bibliographic References 
 Add to: 
Toward Optimal Complete Exchange on Wormhole-Routed Tori
October 1999 (vol. 48 no. 10)
pp. 1065-1082

Abstract—In this paper, we propose new routing schemes to perform all-to-all personalized communication (or known as complete exchange) in wormhole-routed, one-port tori. On tori of equal size along each dimension, our algorithms use both asymptotically optimal startup and transmission time. The results are characterized by several interesting features: 1) the use of gather-scatter tree to achieve optimality in startup time, 2) enforcement of shortest paths in routing messages to achieve optimality in transmission time, 3) application of network-partitioning techniques to reduce the constant associated with the transmission time, and 4) the dimension-by-dimension and gather-scatter-tree approach to make possible applying the results to nonsquare, any-size tori. In the literature, some algorithms are optimal in only one of startup and transmission costs, while some, although asymptotically optimal in both costs, will incur much larger constants associated with the costs. Numerical analysis and experiment both show that significant improvement can be obtained by our scheme on total communication latency over existing results.

[1] A. Bagchi, E.F. Schmeichel, and S.L. Hakimi, “Parallel Information Dissemination by Packets,” SIAM J. Computing, vol. 23, pp. 355-372, 1994.
[2] P. Berman, L. Gravano, J. Sanz, and G. Pifarre, "Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks," Proc. Fourth ACM Symp. Parallel Algorithms and Architectures, June 1992.
[3] S.H. Bokhari, H. Berryman, "Complete Exchange on a Circuit Switched Mesh," Proc. Scalable High Performance Computing Conf., pp. 300-306, 1992.
[4] R. Cypher and L. Gravano, “Adaptive, Deadlock-Free Packet Routing in Torus Networks with Minimal Storage,” Proc. Int'l Conf. Parallel Processing, vol. III, pp. 204-211, 1992.
[5] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[6] V.V. Dimakopoulos and N.J. Dimopoulos, “A Theory for Total Exchange in Multidimensional Interconnection Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 7, pp. 639-649, July 1998.
[7] S. Fujita and M. Yamashita, “Fast Gossiping on Mesh-Bus Computers,” IEEE Trans. Computers, vol. 45, no. 11, pp. 1326-1330, Nov. 1996.
[8] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM 3 Users Guide and Reference Manual. Oak Ridge, Tenn.: Oak Ridge Nat'l Laboratory, May 1994.
[9] S. Gupta, S. Hawkinson, and B. Baxter, “A Binary Interleaved Algorithm for Complete Exchange on a Mesh Architecture,” technical report, Intel Corp., 1994.
[10] S. Hinrichs, C. Kosak, D.R. O'Hallaron, T.M. Stricker, and R. Take, “An Architecture for Optimal All-to-All Personalized Communication,” Technical Report CMU-CS-94-140, School of Computer Science, Carnegie Mellon Univ., Sept. 1994.
[11] C. Ho and M. Kao, “Optimal Broadcast on Hypercubes with Wormhole and e-Cube Routings,” Proc. Int'l Conf. Parallel and Distributed Systems, pp. 694-697, Taipei, Taiwan, 1993,
[12] B.H.H. Juurlink, "Experimental Validation of Parallel Computation Models on the Intel Paragon," Proc. 12th Int'l Parallel Processing Symp. and Ninth Symp. Parallel and Distributed Processing, pp. 492-497, 1998.
[13] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-Phase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., 1995.
[14] D.W. Krumme, G. Cybenko, and K.N. Venkataraman, “Gossiping in Minimal Time,” SIAM J. Computing, vol. 21, no. 1, pp. 111-139, Feb. 1992.
[15] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[16] X. Lin, P.K. McKinley,, and L.M. Ni,"Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 8, Aug. 1994, pp. 793-804.
[17] P.K. McKinley, Y.-J. Tsai, and D. Robinson, "Collective Communication in Wormhole-routed Massively Parallel Computers," Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[18] P.K. McKinley et al., "Unicast-Based Multicast Communication in Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994, pp. 1252-1265.
[19] J. Dongarra et al.,“Document for a standard message-passing interface,” Message Passing Interface Forum, Univ. of Tennessee, Tech. Report CS-93-214, Nov. 1993.
[20] P. Michallon and D. Trystram, “Minimum Depth Arcs-Disjoint Spanning Trees for Broadcasting on Wrap-Around Meshes,” Proc. Int'l Conf. Parallel Processing, vol. 1, pp. 80-83, 1995.
[21] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[22] D.F. Robinson, P.K. McKinley,, and B.H.C. Cheng,"Optimal Multicast Communication in Wormhole-Routed Torus Networks," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 11, Oct. 1995, p. 1029-1042.
[23] D.S. Scott, "Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies," Proc. Sixth Conf. Distributed Memory Concurrent Computers, pp. 398-403, 1991.
[24] H. Song, B. Kwon, and H. Yoon, “Throttle and Preempt: A New Flow Control for Real-Time Communications in Wormhole Networks,” Proc. Int'l Conf. Parallel Processing, pp. 198–202, Aug. 1997.
[25] Y.J. Suh and S. Yalamanchili, "Algorithms for All-to-All Personalized Exchange in 2D and 3D Tori," Proc. 10th Int'l Parallel Processing Symp., pp. 808-814, Apr. 1996.
[26] Y.J. Suh and S. Yalamanchili, “All-to-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 442-458, May 1998.
[27] N.S. Sundar, D.N. Jayasimha, D.K. Panda, and P. Sadayappan, "Hybrid Algorithms for Complete Exchange in 2D Meshes," Proc. Int'l Conf. Supercomputing, 1996.
[28] S. Takkella and S. Seidel, “Complete Exchange and Broadcast Algorithm for Meshes,” Proc. Scalable High-Performance Computing Conf., pp. 422-428, 1994.
[29] R. Thakur, A. Choudhary, and G. Fox, “Complete Exchange on a Wormhole Routed Mesh,” Proc. Int'l Workshop Modeling, Analysis, and Simulation of Computer and Telecomm. System, pp. 131-135, 1994.
[30] Y. Tsai and P.K. McKinley, "A Broadcast Algorithm for All-Port Wormhole-Routed Torus Network," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 8, pp. 876-885, Aug. 1996.
[31] Y.-C. Tseng and S. Gupta, “All-to-All Personalized Communication in a Wormhole-Routed Torus,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 498-505, May 1996.
[32] Y.-C. Tseng, T.-H. Lin, S. Gupta, and D.K. Panda, “Bandwidth-Optimal Complete Exchange on Wormhole Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 4, pp. 380-396, Apr. 1997.
[33] Y.-C. Tseng, S.-Y. Ni, and J.-P. Sheu, “Toward Optimal Complete Exchange on Wormhole-Routed Tori,” Proc. Int'l Conf. Parallel and Distributed Systems, pp. 96-103, 1997.
[34] Y.-C. Tseng, S.-Y. Wang, and C.-W. Ho, “Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 1, pp. 44-61, Jan. 1999.
[35] S.-Y. Wang, Y.-C. Tseng, and C.-W. Ho, “Efficient Multicast in Wormhole-Routed 2D Mesh/Torus Multicomputers: A Network-Partitioning Approach,” Proc. Symp. Frontiers of Massively Parallel Computation, pp. 42-49, 1996.

Index Terms:
All-to-all personalized communication, broadcast, complete exchange, gossiping, multicomputer network, torus, wormhole routing.
Yu-Chee Tseng, Sze-Yao Ni, Jang-Ping Sheu, "Toward Optimal Complete Exchange on Wormhole-Routed Tori," IEEE Transactions on Computers, vol. 48, no. 10, pp. 1065-1082, Oct. 1999, doi:10.1109/12.805156
Usage of this product signifies your acceptance of the Terms of Use.