This Article 
 Bibliographic References 
 Add to: 
Optimal Software Multicast in Wormhole-Routed Multistage Networks
June 1997 (vol. 8 no. 6)
pp. 597-607

Abstract—Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on the multistage network system which supports wormhole routed turnaround routing. Existing machines characterized by such a system model include the IBM SP-1 and SP-2, TMC CM-5, and Meiko CS-2.

Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormhole-routed multistage networks, in the absence of hardware multicast support, by exploiting the properties of the turnaround switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.

[1] Inside the TC2000 Computer, BBN Advanced Computers Inc., Cambridge, Mass., 1990.
[2] C.B. Stunkel, D.G. Shea, B. Abali, M.M. Denneau, P.H. Hochschild, D.J. Joseph, B.J. Nathanson, M. Tsao, and P.R. Varker, “Architecture and Implementation of Vulcan,” Proc. Int'l Parallel Processing Symp., pp. 268-274, Apr. 1994.
[3] N. Koike, “NEC Cenju-3: A Microprocessor-Based Parallel Computer,” Proc. Eighth Int'l Parallel Processing Symp. pp. 396–401, Apr. 1994.
[4] Message Passing Interface Forum, "MPI: A Message-Passing Interface Standard," Technical Report CS-93-214, Univ. of Tennessee, Apr. 1994.
[5] H. Xu, P.K. McKinley, and L.M. Ni, "Efficient Implementation of Barrier Synchronization in Wormhole-Routed Hypercube MultiComputers," J. Parallel and Distributed Computing, vol. 16, pp. 172-184, Oct. 1992.
[6] H. Xu, E.T. Kalns, P.K. McKinley, and L.M. Ni, "ComPaSS: A Communication Package for Scalable Software Design," forthcoming.
[7] P.K. McKinley, H. Xu, A.H. Esfahanian, and L.M. Ni, "Unicast-Based Multicast Communication in Wormhole-Routed Direct Networks," Proc. 1992 Int'l Conf. Parallel Processing, vol. II, pp. 10-19, Aug. 1992.
[8] A. Bar-Noy, J. Bruck, C.T. Ho, S. Kipnis, and B. Schieber, "Computing global combine operations in the multi-port postal model," Fifth IEEE Symp. Parallel and Distributed Processing, pp. 336-343, Dec. 1993.
[9] D.F. Robinson, D. Judd, P.K. McKinley, and B.H.C. Cheng, “Efficient Collective Data Distribution in All-Port Wormhole-Routed Hypercubes,” Proc. Supercomputing Conf., pp. 792–801, Nov. 1993.
[10] C.-T. Ho and M.-Y. Kao, "Optimal Broadcast on Hypercubes with Wormhole and E-Cube Routings," Proc. 1993 Int'l Conf. Parallel and Distributed Systems, pp. 694-697, 1992.
[11] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, 1994.
[12] W.J. Dally and C.L. Seitz, "The Torus Routing Chip," J. Distributed Computing, vol. 1, no. 3, pp. 187-196, 1986.
[13] C.E. Leiserson,Z.S. Abuhamdeh,D.C. Douglas,C.R. Feynman,M.N. Ganmuki,J.V. Hill,W.D. Hillis,B.C. Kuszmaul,M.A. St. Pierre,D.S. Wells,M.C. Wong,S.-W. Yang,, and R. Zak,“The network architecture of the connection machine CM-5,” Proc. Fourth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 272-285, June 1992.
[14] Computing Surface: CS-2 Communications Networks, Meiko Limited, Waltham, Mass., 1993.
[15] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[16] C.E. Leiserson, "Fat-Trees: Universal Networks for Hardware Efficient Supercomputing," IEEE Trans. Computers, vol. C-34, no. 10, Oct. 1985, pp. 892-901.
[17] L.M. Ni, Y. Gui, and S.Q. He, "Performance Evaluation of Multistage Wormhole Networks with Turnaround Routing," Technical Report MSU-CPS-ACS-96, Michigan State Univ., July 1994.
[18] W. Gropp, E. Lusk, and S. Pieper, "Users Guide for the ANL IBM SP-1 DRAFT," Technical Report ANL/MCS-TM-00, Argonne National Laboratory, Feb. 1994.
[19] W. Gropp and B. Smith, "Users Manual for the Chameleon Parallel Programming Tools," Technical Report ANL-93/23, Argonne National Laboratory, June 1993.
[20] H. Franke, "MPI-F: An MPI Implementation for IBM SP-1," Feb. 1994. Available on anonymous ftp from
[21] A. Aho, J. Hopcroft, and J. Ullman, Data Structures and Algorithms.Reading, Mass: Addison-Wesley, 1983.

Index Terms:
Wormhole routing, bidirectional multistage interconnection network, multicast communication, multistage cube network, turnaround routing.
Hong Xu, Yadong Gui, Lionel M. Ni, "Optimal Software Multicast in Wormhole-Routed Multistage Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 6, pp. 597-607, June 1997, doi:10.1109/71.595577
Usage of this product signifies your acceptance of the Terms of Use.