This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing
August 2001 (vol. 12 no. 8)
pp. 808-828

Abstract—The irregular switch-based network of workstations is fast becoming a cost-effective platform for high performance computing. This paper presents efficient multicasting with reduced link contention on irregular switch-based cut-through interconnection using the popular up*/down* (UD) routing and unicast message passing. First, it is proven that, for an arbitrary irregular network with UD routing, it is not possible to create an ordered list of nodes to implement an arbitrary multicast in a link contention-free manner with a minimal number of communication steps. Next, three different multicast algorithms are proposed with their respective node orderings to reduce link contention: switch-based ordering (SO), switch-based hierarchical ordering (SHO), and chain concatenation ordering (CCO). A variation of the binomial tree-based communication pattern, with unicast message passing, is used on the above orderings to implement multicast. Then, the problem of node contention is described in the case when multiple multicasts occur concurrently in a system. Using source-based information, the CCO algorithm is modified to propose a source-partitioned chain concatenation ordering (SPCCO) algorithm. It is also shown how the SPCCO algorithm reduces the effect of node contention at the cost of link contention. Using detailed simulation experiments, the proposed multicast algorithms are compared with each other as well as with the naive random ordering (RO) algorithm for a range of system sizes, switch sizes, message lengths, input buffer sizes, degrees of connectivity, destination set sizes, and communication start-up times. For the case of single multicast, the CCO algorithm is shown to be the best to implement multicast with reduced link contention and minimum latency. For the case of multiple multicasts, the SPCCO algorithm is shown to be the best when the start-up overhead dominates the propagation overhead and the CCO algorithm is shown to be the best otherwise. The results also highlight the importance of reducing link contention when designing efficient multicast, even for systems with large input buffers in the switches. Thus, these results demonstrate significant potential to be applied to current and future generation NOW systems with irregular interconnection.

[1] B. Abali, “A Deadlock Avoidance Method for Computer Networks,” Proc. First Int'l Workshop Comm. and Architectural Support for Network-Based Parallel Computing (CANPC '97), pp. 61-72, Feb. 1997.
[2] N. Boden et al., "Myrinet: A Gigabit-per-Second Local Area Network," IEEE Micro, Feb. 1995, pp. 29-36.
[3] R.V. Boppana, S. Chalasani, and C.S. Raghavendra, “On Multicast Wormhole Routing in Multicomputer Networks,” Proc. Symp. Parallel and Distributed Processing, pp. 722-729, 1994.
[4] J. Bruck,R. Cypher,, and C.T. Ho,“Multiple message broadcasting with generalized Fibonacci trees,” Fourth Symp. Parallel and Distributed Processing, IEEE, pp. 424-431, Dec. 1992.
[5] D. Buntinas, D.K. Panda, J. Duato, and P. Sadayappan, “Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages,” Proc. Fourth Int'l Workshop Comm., Architecture, and Applications for Network-Based Parallel Computing (CANPC '00), Jan. 2000.
[6] L. Cherkasova, V. Kotov, and T. Rokicki, “Fibre Channel Fabrics: Evaluation and Design,” Proc. 29th Hawaii Int'l Conf. System Sciences, Feb. 1995.
[7] J. Cohen, P. Fraigniaud, J.C. Konig, and A. Raspaud, “Optimized Broadcasting and Multicasting Protocols in Cut-Through Routed Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 8, pp. 788-802, Aug. 1998.
[8] L. De Coster, N. Dewulf, and C.-T. Ho, “Efficient Multi-Packet Multicast Algorithms on Meshes with Wormhole and Dimension-Ordered Routing,” Proc. Int'l Conf. Parallel Processing, vol. III, pp. 137-141 Aug. 1995.
[9] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[10] E.W. Felten, R.A. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S.N. Damianakis, C. Dubnicki, L. Iftode, and K. Li, “Early Experience with Message-Passing on the SHRIMP Multicomputer,” Proc. Int'l Symp. Computer Architecture (ISCA), pp. 296-307, 1996.
[11] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard,” Parallel Computing, vol. 22, no. 6, pp. 789–828, 1996.
[12] R. Horst, “ServerNet Deadlock Avoidance and Fractahedral Topologies,” Proc. Int'l Parallel Processing Symp., pp. 274–280, Apr. 1996.
[13] Intel Corporation, Paragon XP/S Product Overview, 1991.
[14] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[15] R. Kesavan, K. Bondalapati, and D.K. Panda, “Multicast on Irregular Switch-Based Networks with Wormhole Routing,” Proc. Int'l Symp. High Performance Computer Architecture (HPCA-3), pp. 48-57, Feb. 1997.
[16] R. Kesavan and D.K. Panda, “Minimizing Node Contention in Multiple Multicast on Wormholek-Aryn-Cube Networks,” Proc. Int'l Conf. Parallel Processing, vol. I, pp. 188-195, Aug. 1996.
[17] R. Kesavan and D.K. Panda, “Optimal Multicast with Packetization and Network Interface Support,” Proc. Int'l Conf. Parallel Processing, pp. 370-377, Aug. 1997.
[18] R. Kesavan and D.K. Panda, “Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 4, pp. 371-393, Apr. 1999.
[19] R. Libeskind-Hadas, D. Mazzoni, and R. Rajagopalan, “Optimal Contention-Free Unicast-Based Multicasting in Switch-Based Networks of Workstations,” Proc. Merged 12th Int'l Parallel Processing Symp. and Ninth Symp. Parallel and Distributed Processing, pp. 358-364 Apr. 1998.
[20] X. Lin and L.M. Ni, “Deadlock-Free Multicast Wormhole Routing in Multicomputer Networks,” Proc. Int'l Symp. Computer Architecture, pp. 116-124, 1991.
[21] P.K. McKinley, Y.-J. Tsai, and D. Robinson, "Collective Communication in Wormhole-routed Massively Parallel Computers," Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[22] P.K. McKinley et al., "Unicast-Based Multicast Communication in Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994, pp. 1252-1265.
[23] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Mar. 1994.
[24] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[25] S. Pakin, M. Lauria, and A. Chien, "High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet," Proc. Supercomputing 95, IEEE Computer Society, Los Alamitos, Calif., Dec. 1995.
[26] D.K. Panda, “Issues in Designing Efficient and Practical Algorithms for Collective Communication in Wormhole-Routed Systems,” Proc. ICPP Workshop Challenges for Parallel Processing, pp. 8-15, 1995.
[27] D.K. Panda, D. Basak, D. Dai, R. Kesavan, R. Sivaram, M. Banikazemi, and V. Moorthy, “Simulation of Modern Parallel Systems: A CSIM-Based Approach,” Proc. 1997 Winter Simulation Conf. (WSC '97), pp. 1013-1020, Dec. 1997.
[28] D.K. Panda, S. Singal, and R. Kesavan, “Multidestination Message Passing in Wormhole k-Ary n-Cube Networks with Base Routing Conformed Paths,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 1, pp. 76-96, Jan. 1999.
[29] W. Qiao and L.M. Ni, “Adaptive Routing in Irregular Networks Using Cut-Through Switches,” Proc. 1996 Int'l Conf. Parallel Processing, Aug. 1996.
[30] M.D. Schroeder et al., “Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links,” Technical Report SRC Research Report 59, Digital Equipment Corp., Apr. 1990.
[31] S.L. Scott and G.M. Thorson, “The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 147-156, Aug. 1996.
[32] F. Silla, M.P. Malumbres, A. Robles, P. López, and J. Duato, Efficient Adaptive Routing in Networks of Workstations with Irregular Topology Proc. Workshop Comm. and Architectural Support for Network-Based Parallel Computing, Feb. 1997.
[33] R. Sivaram, R. Kesavan, D.K. Panda, and C.B. Stunkel, “Architectural Support for Efficient Multicasting in Irregular Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 5, pp. 489-513, May 2001.
[34] R. Sivaram, R. Kesavan, D. K. Panda, C. B. Stunkel, “Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch?” Proc. 27th Int'l Conf. Parallel Processing (ICPP '98), pp. 452-459, Aug. 1998.
[35] R. Sivaram, C.B. Stunkel, and D.K. Panda, “HIPIQS: A High Performance Switch Architecture Using Input Queuing,” Proc. 12th Int'l Parallel Processing Symp., pp. 134-143, Apr. 1998.
[36] R. Sivaram, C.B. Stunkel, and D.K. Panda, “Implementing MultiDestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 8, pp. 794-812, Aug. 2000.
[37] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, “MPI: The Complete Reference,” MIT Press,, 1995.
[38] C.B. Stunkel et al., “The SP1 High-Performance Switch,” Proc. Scalable High-Performance Computing Conf., CS Press, May 1994, pp. 150-157.
[39] C. Stunkel, D. Shea, B. Abali, M. Atkins, C. Bender, D. Grice, P. Hochshild, D. Joseph, B. Nathanson, R. Swetz, R. Stucke, M. Tsao, and P. Varker, “The SP2 High-Performance Switch,” IBM Systems J., vol. 34, no. 2,pp. 185–204, 1995.
[40] C.B. Stunkel, R. Sivaram, and D.K. Panda, “Implementing MultiDestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact,” Proc. 24th IEEE/ACM Ann. Int'l Symp. Computer Architecture (ISCA-24), pp. 50-61, June 1997.
[41] K. Verstoep, K. Langendoen, and H. Bal, “Efficient Reliable Multicast on Myrinet,” Proc. Int'l Conf. Parallel Processing, vol. III, pp. 156-165, Aug. 1996.
[42] T. von Eicken et al., "U-Net: A User-Level Network Interface for Parallel and Distributed Computing," Proc. 15th ACM Symp. OS Principles, ACM Press, New York, 1995, pp. 40-53.
[43] T. von Eicken et al., "Active Messages: a Mechanism for Integrated Communications and Computation," Computer Architecture News, Vol. 20, No. 2, May 1992, pp. 256-266.

Index Terms:
Parallel computer architecture, cut-through routing, wormhole routing, multicast, broadcast, collective communication, switch-based networks, irregular networks, networks of workstations.
Citation:
Ram Kesavan, Dhabaleswar K. Panda, "Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 8, pp. 808-828, Aug. 2001, doi:10.1109/71.946654
Usage of this product signifies your acceptance of the Terms of Use.