This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Architectural Support for Efficient Multicasting in Irregular Networks
May 2001 (vol. 12 no. 5)
pp. 489-513

Abstract—Parallel computing on networks of workstations is fast becoming a cost-effective high-performance computing alternative to MPPs. Such a computing environment typically consists of processing nodes interconnected through a switch-based irregular network. Many of the problems that were solved for regular networks have to be solved anew for these systems. One such problem is that of efficient multicast communication. In this paper, we propose two broad categories of schemes for efficient multicasting in such irregular networks: network interface-based (NI-based) and switch-based. The NI-based multicasting schemes use the network interface of intermediate destinations for absorbing and retransmitting messages to other destinations in the multicast tree. In contrast, the switch-based multicasting schemes use hardware support for packet replication at the switches of the network and a concept known as multidestination routing to convey a multicast message from one source to multiple destinations. We first present alternative schemes for efficient multipacket forwarding at the NI and derive an optimal $k \hbox {-} {binomial}$ multicast tree for multipacket NI-based multicast. We then propose two switch-based multicasting schemes that differ in the power of the encoding scheme and the complexity of the decoding logic at the switches. These multicasting schemes use path-based multidestination worms that can cover all nodes connected to switches along a valid unicast path and tree-based multidestination worms that can cover entire destination sets in a single phase using one worm, respectively. For each scheme, we describe the associated header encoding and decoding operation, the method for deriving multidestination worms that cover arbitrary multicast destination sets, and the multicasting scheme using the derived multidestination worms. We then compare the NI-based multicasting scheme to the switch-based multicasting schemes with path-based and tree-based multidestination worms using simulation to determine the system parameters that affect each of the schemes and the range of system parameters for which each scheme performs best. Our results show that the switch-based multicasting scheme using a single tree-based multidestination worm performs the best among the three schemes. However, the NI-based multicasting scheme is capable of delivering high performance compared to the switch-based multicast using path-based worms, especially when the software overhead at the network interface is less than half of the overhead at the host. We therefore conclude that support for multicast at the NI is an important first step to improving multicast performance. However, there is still considerable gain that can be achieved by supporting hardware multicast in switches. Finally, while supporting such hardware multicast, it is better to support schemes that can achieve multicast in one phase.

[1] R.A.F. Bhoedjang, T. Ruhl, and H.E. Bal, “Efficient Multicast on Myrinet Using Link-Level Flow Control,” Proc. 27th Int'l Conf. Parallel Processing (ICPP '98), pp. 381-390, Aug. 1998.
[2] M.A. Blumrich, K. Li, R. Alpert, C. Dubnicki, E.W. Felten, and J. Sandberg, “Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 96-107, Apr. 1991.
[3] N. Boden et al., "Myrinet: A Gigabit-per-Second Local Area Network," IEEE Micro, Feb. 1995, pp. 29-36.
[4] C.M. Chiang and L.M. Ni, "Multi-Address Encoding for Multicast," Proc. Parallel Computer Routing and Comm. Workshop, pp. 146-160, May 1994.
[5] Compaq Computer Corp., Intel Corp., and Microsoft Corp. Virtual Interface Architecture Specification Version 1.0,http:/www.viarch.org, Dec. 1997.
[6] L. De Coster, N. Dewulf, and C.-T. Ho, “Efficient Multipacket Multicast Algorithms on Meshes with Wormhole and Dimension-Ordered Routing,” Proc. Int'l Conf. Parallel Processing, vol. III, pp. 137-141, Aug. 1995.
[7] D. Dai and D.K. Panda, “Reducing Cache Invalidation Overheads in Wormhole DSMs Using Multidestination Message Passing,” Proc. Int'l Conf. Parallel Processing, pp. I:138–145, Chicago, Ill., Aug. 1996.
[8] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[9] E.W. Felten, R.A. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S.N. Damianakis, C. Dubnicki, L. Iftode, and K. Li, “Early Experience with Message-Passing on the SHRIMP Multicomputer,” Proc. Int'l Symp. Computer Architecture (ISCA), pp. 296-307, 1996.
[10] D. Garcia, “ServerNet II,” Proc. 1997 Parallel Computing, Routing, and Comm. Workshop, June 1997.
[11] M. Gerla, P. Palnati, and S. Walton, “Multicasting Protocols for High-Speed, Wormhole-Routing Local Area Networks,” Computer Comm. Review, pp. 184-193, Oct. 1996.
[12] R. Horst, “ServerNet Deadlock Avoidance and Fractahedral Topologies,” Proc. Int'l Parallel Processing Symp., pp. 274–280, Apr. 1996.
[13] R. Kesavan, “Communication Mechanisms and Algorithms for Supporting Scalable Collective Communication on Parallel Systems,” doctoral thesis, Ohio State Univ., Oct. 1998.
[14] R. Kesavan, K. Bondalapati, and D.K. Panda, “Multicast on Irregular Switch-Based Networks with Wormhole Routing,” Proc. Int'l Symp. High Performance Computer Architecture (HPCA-3), pp. 48-57, Feb. 1997.
[15] R. Kesavan and D.K. Panda, “Minimizing Node Contention in Multiple Multicast on Wormholek-Aryn-Cube Networks,” Proc. Int'l Conf. Parallel Processing, vol. I, pp. 188-195, Aug. 1996.
[16] R. Kesavan and D.K. Panda, "Multicasting on Switch-based Irregular Networks Using Multi-Drop Path-Based Multidestination Worms," Proc. Second Workshop Parallel Computer Routing and Comm. (PCRCW '97), pp. 217-230, June 1997.
[17] R. Kesavan and D.K. Panda, “Optimal Multicast with Packetization and Network Interface Support,” Proc. Int'l Conf. Parallel Processing, pp. 370-377, Aug. 1997.
[18] R. Kesavan and D.K. Panda, “Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 4, pp. 371-393, Apr. 1999.
[19] R. Libeskind-Hadas, D. Mazzoni, and R. Rajagopalan, “Optimal Contention-Free Unicast-Based Multicasting in Switch-Based Networks of Workstations,” Proc. Merged 12th Int'l Parallel Processing Symp. and Ninth Symp. Parallel and Distributed Processing, pp. 358-364 Apr. 1998.
[20] X. Lin and L.M. Ni, “Deadlock-Free Multicast Wormhole Routing in Multicomputer Networks,” Proc. Int'l Symp. Computer Architecture, pp. 116-124, 1991.
[21] P.K. McKinley, Y.-J. Tsai, and D. Robinson, "Collective Communication in Wormhole-routed Massively Parallel Computers," Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[22] P.K. McKinley et al., "Unicast-Based Multicast Communication in Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994, pp. 1252-1265.
[23] “Message Passing Interface Forum,” MPI: A Message-Passing Interface Standard, Mar. 1994.
[24] L. Ni, “Should Scalable Parallel Computers Support Efficient Hardware Multicasting?” Proc. ICPP Workshop Challenges for Parallel Processing, pp. 2-7, 1995.
[25] S. Pakin, M. Lauria, and A. Chien, "High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet," Proc. Supercomputing 95, IEEE Computer Society, Los Alamitos, Calif., Dec. 1995.
[26] D.K. Panda, “Issues in Designing Efficient and Practical Algorithms for Collective Communication in Wormhole-Routed Systems,” Proc. ICPP Workshop Challenges for Parallel Processing, pp. 8-15, 1995.
[27] D.K. Panda, D. Basak, D. Dai, R. Kesavan, R. Sivaram, M. Banikazemi, and V. Moorthy, “Simulation of Modern Parallel Systems: A CSIM-Based Approach,” Proc. 1997 Winter Simulation Conf. (WSC '97), pp. 1013-1020, Dec. 1997.
[28] D.K. Panda, S. Singal, and R. Kesavan, “Multidestination Message Passing in Wormhole k-Ary n-Cube Networks with Base Routing Conformed Paths,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 1, pp. 76-96, Jan. 1999.
[29] D.K. Panda and R. Sivaram, “Fast Broadcast and Multicast in Wormhole Multistage Networks with Multidestination Worms,” Technical Report OSU-CISRC-4/95-TR21, Dept. of Computer and Information Science, Ohio State Univ., Apr. 1995.
[30] J.Y.L. Park, H.A. Choi, N. Nupairoj, and L.M. Ni, “Construction of Optimal Multicast Trees Based on the Parameterized Communication Model,” Proc. Int'l Conf. Parallel Processing, Chicago, Ill., Aug. 1996.
[31] W. Qiao and L.M. Ni, “Adaptive Routing in Irregular Networks Using Cut-Through Switches,” Proc. 1996 Int'l Conf. Parallel Processing, Aug. 1996.
[32] M.D. Schroeder et al. “Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links,” Technical Report SRC research report 59, DEC, Apr. 1990.
[33] F. Silla, M.P. Malumbres, A. Robles, P. López, and J. Duato, Efficient Adaptive Routing in Networks of Workstations with Irregular Topology Proc. Workshop Comm. and Architectural Support for Network-Based Parallel Computing, Feb. 1997.
[34] R. Sivaram, “Architectural Support for Efficient Communication in Scalable Parallel Systems,” doctoral thesis, Ohio State Univ., Aug. 1998.
[35] R. Sivaram, R. Kesavan, D. K. Panda, C. B. Stunkel, “Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch?” Proc. 27th Int'l Conf. Parallel Processing (ICPP '98), pp. 452-459, Aug. 1998.
[36] R. Sivaram, D.K. Panda, and C.B. Stunkel, “Fast Broadcast and Multicast on Wormhole Multistage Networks Using Multiport Encoding,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 36-45, Oct. 1996.
[37] R. Sivaram, D.K. Panda, and C.B. Stunkel, "Multicasting in Irregular Networks with Cut-Through Switches using Tree-Based Multidestination Worms," Proc. Second Parallel Computer Routing and Comm. Workshop (PCRCW '97), pp. 39-52, June 1997.
[38] R. Sivaram, D.K. Panda, and C.B. Stunkel, “Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 1004-1028, Oct. 1998.
[39] R. Sivaram, C.B. Stunkel, and D.K. Panda, "A Reliable Hardware Barrier Synchronization Scheme," Proc. 11th IEEE Int'l Parallel Processing Symp., pp. 274-280, Apr. 1997.
[40] R. Sivaram, C.B. Stunkel, and D.K. Panda, “HIPIQS: A High Performance Switch Architecture Using Input Queuing,” Proc. 12th Int'l Parallel Processing Symp., pp. 134-143, Apr. 1998.
[41] C.B. Stunkel et al., “The SP1 High-Performance Switch,” Proc. Scalable High-Performance Computing Conf., CS Press, May 1994, pp. 150-157.
[42] C. Stunkel, D. Shea, B. Abali, M. Atkins, C. Bender, D. Grice, P. Hochshild, D. Joseph, B. Nathanson, R. Swetz, R. Stucke, M. Tsao, and P. Varker, “The SP2 High-Performance Switch,” IBM Systems J., vol. 34, no. 2,pp. 185–204, 1995.
[43] C.B. Stunkel, R. Sivaram, and D.K. Panda, “Implementing MultiDestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact,” Proc. 24th IEEE/ACM Ann. Int'l Symp. Computer Architecture (ISCA-24), pp. 50-61, June 1997.
[44] K. Verstoep, K. Langendoen, and H. Bal, “Efficient Reliable Multicast on Myrinet,” Proc. Int'l Conf. Parallel Processing, vol. III, pp. 156-165, Aug. 1996.
[45] T. von Eicken et al., "U-Net: A User-Level Network Interface for Parallel and Distributed Computing," Proc. 15th ACM Symp. OS Principles, ACM Press, New York, 1995, pp. 40-53.
[46] T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Int’l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., May 1992, pp. 256-266.
[47] H. Xu, Y.-D. Gui, and L.M. Ni, "Optimal Software Multicast in Wormhole-Routed Multistage Networks," Proc. Supercomputing Conf., pp. 703-712, 1994.

Index Terms:
Parallel computer architecture, cut-through routing, multicast, broadcast, collective communication, switch-based networks, irregular networks, performance evaluation.
Citation:
Rajeev Sivaram, Ram Kesavan, Dhabaleswar K. Panda, Craig B. Stunkel, "Architectural Support for Efficient Multicasting in Irregular Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 5, pp. 489-513, May 2001, doi:10.1109/71.926170
Usage of this product signifies your acceptance of the Terms of Use.