This Article 
 Bibliographic References 
 Add to: 
Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs
November 1999 (vol. 10 no. 11)
pp. 1159-1178

Abstract—Multicast operation is an important operation in multicomputer communication systems and can be used to support several collective communication operations. A significant performance improvement can be achieved by supporting multicast operations at the hardware level. In this paper, we propose an asynchronous tree-based multicasting (ATBM) technique for multistage interconnection networks (MINs). The deadlock issues in tree-based multicasting in MINs are analyzed first to examine the main causes of deadlocks. An ATBM framework is developed in which deadlocks are prevented by serializing the initiations of tree operations that have a potential to create deadlocks. These tree operations are identified through a grouping algorithm. The ATBM approach is not only simple to implement but also provides good communication performance using minimal overheads in terms of additional hardware requirements and synchronization delay. Using the ATBM framework, algorithms are developed for both unidirectional and bidirectional multistage interconnection networks. The performances of the proposed algorithms are evaluated through simulation experiments. The results indicate that the proposed hardware-based ATBM scheme reduces the communication latency when compared to the software multicasting approach proposed earlier.

[1] P.K. McKinley, Y.-J. Tsai, and D. Robinson, "Collective Communication in Wormhole-routed Massively Parallel Computers," Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[2] P. Mitra, D. Payne, L. Shuler, R. van de Geijn, and J. Watts, “Fast Collective Communication Libraries, Please,” Proc. Intel Supercomputing Users' Group Meeting, 1995.
[3] J.J. Dongarra, S.W. Otto, M. Snir, and D. Walker, “A Message Passing Standard for MPP and Workstations,” Comm. ACM, vol. 39, pp. 84-90, July 1996.
[4] P.K. McKinley, H. Xu, A. Esfahanian, and L.M. Ni, “Unicast-Based Multicast Communication in Wormhole Routed Networks,” Proc. Int'l Conf. Parallel Processing, vol. 2, pp. 10-19, 1992.
[5] X. Lin, P.K. McKinley,, and L.M. Ni,"Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 8, Aug. 1994, pp. 793-804.
[6] R.V. Boppana, S. Chalasani, and C.S. Raghavendra, “On Multicast Wormhole Routing in Multicomputer Networks,” Proc. Symp. Parallel and Distributed Processing, pp. 722-729, 1994.
[7] D.K. Panda, S. Singal, and P. Prabhakaran, “Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme,” Proc. Parallel Computer Routing and Comm. Workshop, pp. 131–145, 1994.
[8] P. Mohapatra and V. Varavithya, “A Hardware Multicast Routing Algorithm for Two-Dimensional Meshes,” Proc. Eighth Symp. Parallel and Distributed Processing, pp. 198-205, Oct. 1996.
[9] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[10] L.M. Ni, Y. Gui, and S. Moore, “Performance Evaluation of Switch-Based Wormhole Networks,” Proc. Int'l Conf. Parallel Processing, Aug. 1995.
[11] C.L. Wu and T.Y. Feng,A Tutorial on Interconnection Network for Parallel and Distributed Processing. IEEE CS Press, 1984.
[12] N. Koike, “NEC Cenju-3: A Microprocessor-Based Parallel Computer,” Proc. Eighth Int'l Parallel Processing Symp. pp. 396–401, Apr. 1994.
[13] C.B. Stunkel et al., “The SP1 High-Performance Switch,” Proc. Scalable High-Performance Computing Conf., CS Press, May 1994, pp. 150-157.
[14] C. Stunkel, D. Shea, B. Abali, M. Atkins, C. Bender, D. Grice, P. Hochshild, D. Joseph, B. Nathanson, R. Swetz, R. Stucke, M. Tsao, and P. Varker, “The SP2 High-Performance Switch,” IBM Systems J., vol. 34, no. 2,pp. 185–204, 1995.
[15] H. Xu, Y.-D. Gui, and L.M. Ni, "Optimal Software Multicast in Wormhole-Routed Multistage Networks," Proc. Supercomputing Conf., pp. 703-712, 1994.
[16] C.M. Chiang and L.M. Ni, "Efficient Software Multicast in Wormhole-Routed Unidirectional Multistage Networks," Proc. Symp. Parallel and Distributed Processing, 1995.
[17] J.Y.L. Park, H.A. Choi, N. Nupairoj, and L.M. Ni, “Construction of Optimal Multicast Trees Based on the Parameterized Communication Model,” Proc. Int'l Conf. Parallel Processing, Chicago, Ill., Aug. 1996.
[18] L.M. Ni, “Should Scalable Parallel Computer Support Efficient Hardware Multicast?,” Proc. ICPP Workshop Challenges for Parallel Processing, pp. 2-7, 1995.
[19] D.K. Panda, “Issues in Designing Efficient and Practical Algorithms for Collective Communication on Wormhole-Routed Systems,” Proc. ICPP Workshop Challenges for Parallel Processing, pp. 8-15, 1995.
[20] C. Chiang and L.M. Ni, “Deadlock-Free Multi-Head Wormhole Routing,” Proc. First High Performance Computing-Asia, 1995.
[21] D.K. Panda and R. Sivaram, “Fast Broadcast and Multicast in Wormhole Multistage Networks with Multidestination Worms,” Technical Report OSU-CISRC-04/95-TR21, Computer and Information Science Dept., Ohio State Univ., 1995.
[22] M.P. Malumbres, J. Duato, and J. Torrellas, “An Efficient Implementation of Tree-Based Multicast Routing in Distributed Shared-Memory Multiprocessors,” Proc. Eighth Symp. Parallel and Distributed Processing, pp. 186-189, Oct. 1996.
[23] R. Sivaram, D.K. Panda, and C.B. Stunkel, “Fast Broadcast and Multicast on Wormhole Multistage Networks Using Multiport Encoding,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 36-45, Oct. 1996.
[24] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[25] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[26] C.B. Stunkel, R. Sivaram, and D.K. Panda, “Implementing MultiDestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact,” Proc. 24th IEEE/ACM Ann. Int'l Symp. Computer Architecture (ISCA-24), pp. 50-61, June 1997.
[27] S. Balakrishnan and D.K. Panda, “Impact of Multiple Consumption Channels on Wormhole Routed k-ary n-Cube Networks,” Proc. Int'l Parallel Processing Symp., pp. 163-167, 1993.
[28] H.J. Siegel, “The Theory Underlying the Partitioning of Permutation Networks,” IEEE Trans. Computers, vol. 29, no. 9, pp. 791-800, Sept. 1980.
[29] M.D. Schroeder, A.D. Birrell, M. Burrows, H. Murray, R.M. Needham, T.L. Rodeheffer, E.H. Satterthwaite, and C.P. Thacker, “Autonet: A High-Speed Self-Configuring Local Area Network Using Point-to-Point Links,” Technical Report SRC Research Report 59, 1990.
[30] K.V. Anjan and T.M. Pinkston, "DISHA: An Efficient Fully Adaptive Deadlock Recovery Scheme," Proc. Ninth Int'l Parallel Processing Symp., Apr. 1995.
[31] H.D. Schwetman, “Introduction to Process-Oriented Simulation and CSIM,” Proc. Winter Simulation Conf., pp. 154-157, Dec. 1990.

Index Terms:
Asynchronous tree-based multicasting, multistage interconnection networks, deadlock configurations, multicast routing algorithm, wormhole switching.
Vara Varavithya, Prasant Mohapatra, "Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 11, pp. 1159-1178, Nov. 1999, doi:10.1109/71.809574
Usage of this product signifies your acceptance of the Terms of Use.