This Article 
 Bibliographic References 
 Add to: 
Scalable S-To-P Broadcasting on Message-Passing MPPs
August 1998 (vol. 9 no. 8)
pp. 758-768

Abstract—In s-to-p broadcasting, s processors in a p-processor machine contain a message to be broadcast to all the processors, 1 ≤sp. We present a number of different broadcasting algorithms that handle all ranges of s. We show how the performance of each algorithm is influenced by the distribution of the s source processors and by the relationships between the distribution and the characteristics of the interconnection network. For the Intel Paragon we show that for each algorithm and machine dimension there exist ideal distributions and distributions on which the performance degrades. For the Cray T3D we also demonstrate dependencies between distributions and machine sizes. To reduce the dependence of the performance on the distribution of sources, we propose a repositioning approach. In this approach, the initial distribution is turned into an ideal distribution of the target broadcasting algorithm. We report experimental results for the Intel Paragon and Cray T3D and discuss scalability and performance.

[1] V. Bala,J. Bruck,R. Cypher,P. Elustondo,A. Ho,C.T. Ho,S. Kipnis,, and M. Snir,“CCL: A portable and tunable collective communication library forscalable parallel computers,” Eighth Int’l Parallel Processing Symp., IEEE, pp. 835-844, Apr. 1994.
[2] M. Barnett, S. Gupta, D. Payne, L. Shuler, R. van de Geijn, and J. Watts, “Interprocessor Collective Communication Library (InterCom),” Proc. Scalable High Performance Computing Conf., pp. 357-364, May 1994.
[3] S.H. Bokhari, "Multiphase Complete Exchange on a Circuit Switched Hypercube," Proc. Int'l Conf. Parallel Processing, pp. 525-529, 1991.
[4] J. Bruck,L. de Coster,N. Dewulf,C.T. Ho,, and R. Lauwereins,“On the design and implementation of broadcast and global combineoperations using the postal model,” Sixth Symp. Parallel and Distributed Processing, IEEE, pp. 594-602, Oct. 1994.
[5] T.H. Cormen,C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms.Cambridge, Mass.: MIT Press/McGraw-Hill, 1990.
[6] J.J. Dongarra, R. Hempe, A.J.G. Hey, and D.W. Walker, "A Proposal for a User-level, Message Passing Interface in a Distributed Memory Environment," Technical Report TM 12231, Oak Rigde Laboratory, 1993.
[7] W. Gropp and E. Lusk, “The MPI Communication Library: Its Design and a Portable Implementation,” Proc. Scalable Parallel Libraries Conf., Oct. 1993.
[8] S.E. Hambrusch, F. Hameed, and A.A. Khokhar, "Communication Operations on Coarse-Grained Mesh Architectures," Parallel Computing, vol. 21, pp. 731-751, 1995.
[9] S.E. Hambrusch and A. Khokhar, "Maintaining Spatial Data Sets in Distributed-Memory Machines," Proc. 11th Int'l Parallel Processing Symp., Apr. 1997.
[10] S. Hinrichs, C. Kosak, D.R. O'Hallaron, T.M. Sticker, and R. Take, "An Architecture for Optimal All-to-All Personalized Communication," Proc. Symp. Parallel Algorithms and Architectures, pp. 310-319, 1994.
[11] R. Karp,A. Sahay,E. Santos,, and K.E. Schauser,“Optimal broadcast and summation in the LogP model,” Proc. Fifth Ann. Symp. Parallel Algorithms and Architectures, ACM, June 1993.
[12] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[13] Y. Lan, A.-H. Esfahanian, and L.M. Ni, "Multicast in Hypercube Multiprocessors," J. Parallel and Distributed Computing, vol. 8, pp. 30-41, Jan. 1990.
[14] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[15] X. Lin, P.K. McKinley, and L.M. Ni, "Performance Evaluation of Multicast Wormhole Routing in 2D-Mesh Multicomputers," Proc. Int'l Conf. Parallel Processing, pp. 1,435-1,442, 1991.
[16] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[17] J.Y.L. Park, H.A. Choi, N. Nupairoj, and L.M. Ni, “Construction of Optimal Multicast Trees Based on the Parameterized Communication Model,” Proc. Int'l Conf. Parallel Processing, Chicago, Ill., Aug. 1996.
[18] S. Ranka, R.V. Shankar, and K.A. Alsabti, “Many-to-Many Communication with Bounded Traffic,” Proc. Frontiers of Massively Parallel Computation, pp. 20-27, 1995.
[19] S. Ranka, J.C. Wang, and G.C. Fox, "Static and Runtime Scheduling of All-to-Many Personalized Communication on Permutation Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994.
[20] R. Thakur and A. Choudhary, "All-to-All Communication on Meshes with Wormhole Routing," Proc. Eighth Int'l Parallel Processing Symp., pp. 561-565, Apr. 1994.
[21] E.A. Varvarigos and D.P. Bertsekas, "Dynamic Broadcasting in Parallel Computing" IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 2, pp. 120-131, Feb. 1995.

Index Terms:
Broadcasting, communication operations, message-passing MPPs, scalability.
Susanne E. Hambrusch, Ashfaq A. Khokhar, Yi Liu, "Scalable S-To-P Broadcasting on Message-Passing MPPs," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 8, pp. 758-768, Aug. 1998, doi:10.1109/71.706048
Usage of this product signifies your acceptance of the Terms of Use.