This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori
February 2002 (vol. 13 no. 2)
pp. 128-141

Abstract—All-to-all communication is one of the most dense collective communication patterns and occurs in many important applications in parallel and distributed computing. In this paper, we present a new all-to-all broadcast algorithm in multidimensional all-port mesh and torus networks. We propose a broadcast pattern which ensures a balanced traffic load in all dimensions in the network so that the all-to-all broadcast algorithm can achieve a very tight near-optimal transmission time. The algorithm also takes advantage of overlapping of message switching time and transmission time, and the total communication delay asymptotically matches the lower bound of all-to-all broadcast. Finally, the algorithm is conceptually simple and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the near-optimum in practice.

[1] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[2] D.S. Scott, "Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies," Proc. Sixth Conf. Distributed Memory Concurrent Computers, pp. 398-403, 1991.
[3] R. Thakur and A. Choudhary, "All-to-All Communication on Meshes with Wormhole Routing," Proc. Eighth Int'l Parallel Processing Symp., pp. 561-565, Apr. 1994.
[4] Y.-C. Tseng and S. Gupta, “All-to-All Personalized Communication in a Wormhole-Routed Torus,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 498-505, May 1996.
[5] Y.-C. Tseng, T.-H. Lin, S. Gupta, and D.K. Panda, “Bandwidth-Optimal Complete Exchange on Wormhole Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 4, pp. 380-396, Apr. 1997.
[6] F. Petrini, “Total-Exchange on Wormholek-Aryn-Cubes with Adaptive Routing,” Proc. First Merged IEEE Int'l Parallel Processing Symp. and Symp. Parallel and Distributed Processing, pp. 267-271, Mar. 1998.
[7] Y.J. Suh and S. Yalamanchili, “All-to-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 442-458, May 1998.
[8] Y.J. Suh and K.G. Shin, “All-to-All Personalized Communication in Multidimensional Torus and Mesh Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 1, pp. 38-59, Jan. 2001.
[9] Y.J. Suh and S. Yalamanchili, “Configurable Algorithms for Complete Exchange in 2D Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 4, pp. 337-356, Apr. 2000.
[10] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[11] J. Bruck,C.T. Ho,S. Kipnis,, and D. Weathersby,“Efficient algorithms for all-to-all communications in multiportmessage-passing systems,” Sixth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 298-309, June 1994.
[12] Y. Yang and J. Wang, “Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 3, pp. 261-274, Mar. 2000.
[13] Y. Yang and J. Wang, “Optimal All-to-All Personalized Exchange in a Class of Optical Multistage Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 567-582, June 2001.
[14] Y. Saad and M.H. Schultz, “Data Communication in Parallel Architectures,” Parallel Computing, vol. 11, pp. 131-150, 1989.
[15] C. Calvin, S. Perennes, and D. Trystram, “All-to-All Broadcast in Torus with Wormhole-Like Routing,” Proc. Seventh IEEE Symp. Parallel and Distributed Processing, pp. 130-137, 1995.
[16] B. Juurlink, J.F. Sibeyn, and P.S. Rao, “Gossiping on Meshes and Tori,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, pp. 513–525, June 1998.
[17] U. Meyer and J. F. Sibeyn, “Time-Independent Gossiping on Full-Port Tori,” Technical Report MPI-I-98-1-014, Max-Planck Inst. fur Informatik, Sept. 1998.
[18] M. Soch and P. Tvrdik, “Time-Optimal Gossip of Large Packets in Noncombining 2D Tori and Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 12, pp. 1252-1261, Dec. 1999.
[19] Y. Yang and J. Wang, “Pipelined All-to-All Broadcast in All-Port Meshes and Tori,” IEEE Trans. Computers, vol. 50, no. 10, pp. 1020-1032, Oct. 2001.
[20] D. Gannon and J.V. Rosendale, “On the Impact of Communication Complexity in the Design of Parallel Numerical Algorithms,” IEEE Trans. Computer, vol. 33, pp. 1180-1194, Dec. 1984.
[21] S.L. Johnsson, "Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures," J. Parallel and Distributed Computing, vol. 4, pp. 133-172, 1987.
[22] G. Fox,M. Johnson,G. Lyzenga,S. Otto,J. Salmon,, and D. Walker,Solving Problems on Concurrent Processors, Vol. I: General Techniques andRegular Problems.Englewood Cliffs, N.J.: Prentice Hall 1988.

Index Terms:
Parallel computing, collective communication, all-to-all communication, all-to-all broadcast, gossip, broadcast tree, routing, interprocessor communication, mesh, torus.
Citation:
Yuanyuan Yang, Jianchao Wang, "Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 2, pp. 128-141, Feb. 2002, doi:10.1109/71.983941
Usage of this product signifies your acceptance of the Terms of Use.