This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach
January 1999 (vol. 10 no. 1)
pp. 44-61

Abstract—In this paper, a network-partitioning approach for one-to-all broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenly divided into submessages, each being sent to a representative node in one DDN. Second, the submessages are broadcast on the DDNs concurrently. Finally, a number of data-collecting networks (DCNs), which can work independently too, are constructed. Then, concurrently on each DCN, the submessages are collected and combined into the original message. Our approach, especially designed for wormhole-routed networks, is conceptually similar but fundamentally very different from the traditional approach (e.g., [4], [13], [18], [31]) of using multiple edge-disjoint spanning trees in parallel for broadcasting in store-and-forward networks. One interesting issue is on the definition of independent DDNs and DCNs, in the sense of wormhole routing. We show how to apply this approach to tori, meshes, and hypercubes. Thorough analyses and comparisons based on different system parameters and configurations are conducted. The results do confirm the advantage of our scheme, under various system parameters and conditions, over other existing broadcasting algorithms.

[1] V. Bala,J. Bruck,R. Cypher,P. Elustondo,A. Ho,C.T. Ho,S. Kipnis,, and M. Snir,“CCL: A portable and tunable collective communication library forscalable parallel computers,” Eighth Int’l Parallel Processing Symp., IEEE, pp. 835-844, Apr. 1994.
[2] M. Barnett, S. Gupta, D. Payne, L. Shuler, R. van de Geijn, and J. Watts, “Interprocessor Collective Communication Library (InterCom),” Proc. Scalable High Performance Computing Conf., pp. 357-364, May 1994.
[3] M. Barnett, D.G. Payne, R.A. van de Geijn, and J. Watts, "Broadcasting on Meshes with Wormhole Routing," J. Parallel and Distributed Computing, vol. 35, no. 6, pp. 111-122, June 1996.
[4] J.-C. Bermond, P. Michallon, and D. Trystram, "Broadcasting in Wraparound Meshes with Parallel Monodirectional Link," Parallel Computing, vol. 18, pp. 639-648, 1992.
[5] J. Bruck, L. De Coster, N. Dewulf, C.-T. Ho, and R. Lauwereins, "On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 3, pp. 256-265, Mar. 1996.
[6] Cray T3E Scalable Parallel Processing System. Cray Research Inc., 1995.
[7] W. Dally and C. Seitz, "The Torus Routing Chip," J. Distributed Computing, vol. 1, no. 3, pp. 187-196, 1986.
[8] B. Duzett and R. Buck, "An Overview of the nCUBE3 Supercomputer," Proc. Fourth Symp. Frontiers of Massively Parallel Computation, pp. 458-464, 1992.
[9] R. L. Graham, D. E. Knuth, and O. Patashnik,Concrete Mathematics. Reading, MA: Addison-Wesley, 1989.
[10] C.-T. Ho and M.-Y. Kao, "Optimal Broadcast in All-Port Wormhole-Routed Hypercubes," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 2, pp. 200-318, Feb. 1995.
[11] A Touchstone DELTA System Description. Intel Corp., 1990.
[12] S.L. Johnsson, "Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures," J. Parallel and Distributed Computing, vol. 4, pp. 133-172, 1987.
[13] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[14] R.E. Kessler and J.L. Schwarzmeier, "CRAY T3D: A New Dimension for Cray Research," Proc. COMPCON, pp. 176-182, Feb. 1993.
[15] P.K. McKinley, Y.-J. Tsai, and D. Robinson, "Collective Communication in Wormhole-routed Massively Parallel Computers," Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[16] P.K. McKinley et al., "Unicast-Based Multicast Communication in Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994, pp. 1252-1265.
[17] "Document for Standard Message-Passing Interface," Message Passing Interface Forum, Nov. 1993.
[18] P. Michallon and D. Trystram, "Minimum Depth Arcs-Disjoint Spanning Trees for Broadcasting on Wrap-Around Meshes," Proc. Int'l Conf. Parallel Processing, vol. 1, pp. 80-83, 1995.
[19] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[20] P.R. Nuth and W.J. Dally, “The J-Machine Network,” Proc. 1992 IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors, pp. 420-423, Oct. 1992.
[21] J. Park, H.G. Kim, S. Hwang, J. Kim, I. Jang, H. Yoon, and J.W. Cho, "An Efficient Unicast-Based Multicast Algorithm in Two-Port Wormhole-Routed 2D Mesh Networks," Proc. IEEE Int'l Conf. Algorithms and Architecture for Parallel Processing, pp. 326-331, 1996.
[22] D.F. Robinson, P.K. McKinley,, and B.H.C. Cheng,"Optimal Multicast Communication in Wormhole-Routed Torus Networks," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 11, Oct. 1995, p. 1029-1042.
[23] m. Schmidt-Voigt, "Efficient Parallel Communication with the nCUBE 2S Processor," Parallel Computing, vol. 20, pp. 509-530, 1994.
[24] Y.-J. Tsai and P.K. McKinley, "A Dominating Set Model for Broadcasting in All-Port Wormhole-Routed 2D Mesh Networks," Proc. ACM Int'l Conf. Supercomputing, pp. 126-135, 1994.
[25] Y.J. Tsai and P.K. McKinkey, “An Extended Dominating Node Approach to Collective Communication in All-Port Wormhole-Routed 2D Meshes,” Proc. Scalable High-Performance Computing Conf., pp. 199–206, Oct. 1994.
[26] Y.-J. Tsai and P.K. McKinley, "A Broadcasting Algorithm for All-Port Wormhole-Routed Torus Networks," Proc. Symp. Frontiers of Massively Parallel Computation, pp. 529-536, 1995.
[27] Y. Tsai and P.K. McKinley, "A Broadcast Algorithm for All-Port Wormhole-Routed Torus Network," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 8, pp. 876-885, Aug. 1996.
[28] Y.-C. Tseng and S. Gupta, “All-to-All Personalized Communication in a Wormhole-Routed Torus,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 498-505, May 1996.
[29] Y.-C. Tseng, T.-H. Lin, S. Gupta, and D.K. Panda, “Bandwidth-Optimal Complete Exchange on Wormhole Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 4, pp. 380-396, Apr. 1997.
[30] Y.C. Tseng, D.K. Panda, and T.H. Lai, "A Trip-Based Multicasting Model in Wormhole-Routed Networks with Virtual Channels," IEEE Trans. Parallel and Distributed Systems, Vol. 7, No. 2, Feb. 1996, pp.138-150.
[31] Y.-C. Tseng and J.-P. Sheu, "Toward Optimal Broadcast in a Star Graph Using Multiple Spanning Trees," IEEE Trans. Computers, vol. 46, no. 5, pp. 593-599, May 997.
[32] C.-M. Wang and C.-Y. Ku, "A Near-Optimal Broadcasting Algorithm in All-Port Wormhole-Routed Hypercubes," Proc. ACM Int'l Conf. Supercomputing, pp. 147-153, 1995.
[33] S.-Y. Wang, Y.-C. Tseng, and C.-W. Ho, "Efficient Single-Node Broadcast in Wormhole-Routed Multicomputers: A Network-Partitioning Approach," Proc. Symp. Parallel and Distributed Processing, 1996.
[34] H. Xu, P.K. McKinley, and L.M. Ni, "Efficient Implementation of Barrier Synchronization in Wormhole-Routed Hypercubes Multicomputers," J. Parallel and Distributed Computing, vol. 16, pp. 172-184, 1992.

Index Terms:
Collective communication, hypercube, interconnection network, mesh, one-to-all broadcast, parallel processing, torus, wormhole routing.
Citation:
Yu-Chee Tseng, San-Yuan Wang, Chin-Wen Ho, "Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 1, pp. 44-61, Jan. 1999, doi:10.1109/71.744837
Usage of this product signifies your acceptance of the Terms of Use.