This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Pipelined All-to-All Broadcast in All-Port Meshes and Tori
October 2001 (vol. 50 no. 10)
pp. 1020-1032

Abstract—All-to-all communication is one of the most dense communication patterns and occurs in many important applications in parallel computing. In this paper, we present a new all-to-all broadcast algorithm in all-port meshes and tori. The algorithm utilizes a controlled message flooding based on a novel broadcast pattern, which ensures a balanced traffic load in all dimensions in the network so that the optimal transmission time for all-to-all broadcast can be achieved. The broadcast pattern is described in a formal, generic way for each node in terms of a few simple operations and can be easily built into router hardware. Unlike existing all-to-all broadcast algorithms, the new algorithm overlaps message switching time with transmission time in a pipelined fashion to reduce the total communication delay of all-to-all broadcast. In most cases, the total communication delay is close to the lower bound of all-to-all broadcast within a small constant range. Finally, the algorithm is conceptually simple and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the optimum in practice.

[1] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[2] Y. Yang, G.M. Masson, “Nonblocking Broadcast Switching Networks,” IEEE Trans. Computers, vol. 40, pp. 1,005-1,015, 1991.
[3] Y. Yang, “A Class of Interconnection Networks for Multicasting,” IEEE Trans. Computers, vol. 47, no. 8, pp. 899-906, Aug. 1998.
[4] D.S. Scott, "Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies," Proc. Sixth Conf. Distributed Memory Concurrent Computers, pp. 398-403, 1991.
[5] R. Thakur and A. Choudhary, "All-to-All Communication on Meshes with Wormhole Routing," Proc. Eighth Int'l Parallel Processing Symp., pp. 561-565, Apr. 1994.
[6] Y.-C. Tseng and S. Gupta, “All-to-All Personalized Communication in a Wormhole-Routed Torus,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 498-505, May 1996.
[7] Y.-C. Tseng, T.-H. Lin, S. Gupta, and D.K. Panda, “Bandwidth-Optimal Complete Exchange on Wormhole Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 4, pp. 380-396, Apr. 1997.
[8] F. Petrini, “Total-Exchange on Wormholek-Aryn-Cubes with Adaptive Routing,” Proc. First Merged IEEE Int'l Parallel Processing Symp. and Symp. Parallel and Distributed Processing, pp. 267-271, Mar. 1998.
[9] Y.J. Suh and S. Yalamanchili, “All-to-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 442-458, May 1998.
[10] Y. J. Suh and K.G. Shin, “Efficient All-to-All Personalized Exchange in Multidimensional Torus Networks,” Proc. 27th Int'l Conf. Parallel Processing, Aug. 1998.
[11] Y.J. Suh and S. Yalamanchili, “Configurable Algorithms for Complete Exchange in 2D Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 4, pp. 337-356, Apr. 2000.
[12] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[13] J. Bruck,C.T. Ho,S. Kipnis,, and D. Weathersby,“Efficient algorithms for all-to-all communications in multiportmessage-passing systems,” Sixth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 298-309, June 1994.
[14] Y. Yang and J. Wang, “Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 3, pp. 261-274, Mar. 2000.
[15] Y. Yang and J. Wang, “Optimal All-to-All Personalized Exchange in a Class of Optical Multistage Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 567-582, June 2001.
[16] Y. Saad and M. H. Schultz, “Data Communication in Parallel Architectures,” Parallel Computing, vol. 11, pp. 131-150, 1989.
[17] C. Calvin, S. Perennes, and D. Trystram, “All-to-All Broadcast in Torus with Wormhole-Like Routing,” Proc. Seventh IEEE Symp. Parallel and Distributed Processing, pp. 130-137, 1995.
[18] S. Fujita and M. Yamashita, “Fast Gossiping on Mesh-Bus Computers,” IEEE Trans. Computers, vol. 45, no. 11, pp. 1326-1330, Nov. 1996.
[19] B. Juurlink, J.F. Sibeyn, and P.S. Rao, “Gossiping on Meshes and Tori,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, pp. 513–525, June 1998.
[20] U. Meyer and J.F. Sibeyn, “Time-Independent Gossiping on Full-Port Tori,” Technical Report MPI-I-98-1-014, Max-Planck Institut für Informatik, Sept. 1998.
[21] M. Soch and P. Tvrdik, “Time-Optimal Gossip of Large Packets in Noncombining 2D Tori and Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 12, pp. 1252-1261, Dec. 1999.
[22] Y. Yang and J. Wang, “Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori,” Proc. 15th IEEE Int'l Parallel and Distributed Processing Symposium (IPDPS '01), Apr. 2001.
[23] D. Gannon and J.V. Rosendale, “On the Impact of Communication Complexity in the Design of Parallel Numerical Algorithms,” IEEE Trans. Computers, vol. 33, no. 12, pp. 1180-1194, Dec. 1984.
[24] S.L. Johnsson, "Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures," J. Parallel and Distributed Computing, vol. 4, pp. 133-172, 1987.
[25] G.C. Fox et al., Solving Problems on Concurrent Processors. Volume I: General Techniques and Regular Problems. Englewood Cliffs, N.J.: Prentice Hall, 1988.

Index Terms:
Parallel computing, collective communication, all-to-all communication, all-to-all broadcast, gossip, broadcast tree, routing, interprocessor communication.
Citation:
Yuanyuan Yang, Jianchao Wang, "Pipelined All-to-All Broadcast in All-Port Meshes and Tori," IEEE Transactions on Computers, vol. 50, no. 10, pp. 1020-1032, Oct. 2001, doi:10.1109/12.956089
Usage of this product signifies your acceptance of the Terms of Use.