<p><b>Abstract</b>—This paper proposes a new approach for implementing fast multicast and broadcast in unidirectional and bidirectional multistage interconnection networks (MINs) with multiport encoded multidestination worms. For a MIN with <it>n</it> stages, such worms use <it>n</it> header flits each. One flit is used for each stage of the network and it indicates the output ports to which a multicast message needs to be replicated. A multiport encoded worm with (<it>d</it><sub>1</sub>, <it>d</it><sub>2</sub> ..., <it>d</it><sub><it>n</it></sub>, 1 ≤<it>d</it><sub><it>i</it></sub>≤<it>k</it>) degrees of replication for the respective stages is capable of covering (<it>d</it><sub>1</sub>×<it>d</it><sub>2</sub>× ... ×<it>d</it><sub><it>n</it></sub>) destinations with a single communication start-up. In this paper, a switch architecture is proposed for implementing multidestination worms without deadlock. Three grouping algorithms of varying complexity are presented to derive the associated multiport encoded worms for a multicast to an arbitrary set of destinations. Using these worms, a multinomial tree-based scheme is proposed to implement the multicast. This scheme significantly reduces broadcast/multicast latency compared to schemes using unicast messages. Simulation studies for both unidirectional and bidirectional MIN systems indicate that improvement in broadcast/multicast latency up to a factor of four is feasible using the new approach. Interestingly, this approach is able to implement multicast with <it>reduced</it> latency as the number of destinations <it>increases</it> beyond a certain number. Compared to implementing unicast messages, this approach requires little additional logic at the switches. Thus, the scheme demonstrates significant potential for implementing efficient collective communication operations on current and future MIN-based systems.</p>
Parallel computer architecture, collective communication, multistage interconnection networks, interprocessor communication, broadcast, multicast, wormhole routing, virtual cut-through.
