This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model
March 1996 (vol. 7 no. 3)
pp. 256-265

Abstract—There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of λ and will arrive at the receiving node at round r + λ− 1.

Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency λ and the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%.

[1] V. Bala,J. Bruck,R. Bryant,R. Cypher,P. deJong,P. Elustondo,D. Frye,A. Ho,C.T. Ho,G. Irwin,S. Kipnis,R. Lawrence,, and M. Snir,“The IBM external user interface for scalable parallel systems,” Parallel Computing, vol. 20, no. 4, pp. 445-462, Apr. 1994.
[2] V. Bala, J. Bruck, R. Cypher, P. Elustondo, A. Ho, C.-T. Ho, S. Kipnis, and M. Snir, "CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 2, pp. 154-164, Feb. 1995.
[3] V. Bala and S. Kipnis, "Process groups: A mechanism for the coordination of and communication among processes in the Venus collective communication library," Proc. Seventh Int'l Parallel Processing Symp., IEEE, Apr. 1993.
[4] M. Barnett, S. Gupta, D. Payne, L. Shuler, R. van de Geijn, and J. Watts, “Interprocessor Collective Communication Library (InterCom),” Proc. Scalable High Performance Computing Conf., pp. 357-364, May 1994.
[5] M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn, "Global combine on mesh architectures with wormhole routing," Seventh Int'l Parallel Processing Symp., IEEE, Newport Beach, Calif., Apr. 1993.
[6] A. Bar-Noy, J. Bruck, C.T. Ho, S. Kipnis, and B. Schieber, "Computing global combine operations in the multi-port postal model," Fifth IEEE Symp. Parallel and Distributed Processing, pp. 336-343, Dec. 1993.
[7] A. Bar-Noy and S. Kipnis,“Designing broadcasting algorithms in the postal model formessage-passing systems,” Math. Systems Theory, vol. 27, no. 5, pp. 431-452, 1994.
[8] A. Bar-Noy and S. Kipnis,“Multiple message broadcasting in the postal model, Proc. Seventh Int’l Parallel Processing Symp., IEEE, Apr. 1993.
[9] A. Bar-Noy and S. Kipnis,“Broadcasting multiple messages in simultaneous send/receivesystems, Fifth Symp. Parallel and Distributed, Processing, IEEE, pp. 344-347, Dec. 1993.
[10] A. Bar-Noy, S. Kipnis, and B. Schieber, "An optimal algorithm for computing census functions in message-passing systems," Parallel Processing Letters, vol. 3, no. 1, pp. 19-23, Mar. 1993.
[11] A. Bar-Noy,S. Kipnis,, and B. Schieber,“Optimal computation of census functions in the postal model,” to appear in Discrete Applied Math.
[12] J. Bruck,R. Cypher,, and C.T. Ho,“Multiple message broadcasting with generalized Fibonacci trees,” Fourth Symp. Parallel and Distributed Processing, IEEE, pp. 424-431, Dec. 1992.
[13] J. Bruck and C.T. Ho, "Efficient global combine operations in multi-port message-passing systems," Parallel Processing Letters, vol. 3, no. 4, pp. 335-346, Dec. 1993.
[14] D. Culler, A.C. Dusseau, R.P. Martin, and K.E. Schauser, "Fast Parallel Sorting under LogP: From theory to practice," Proc. Workshop on Portability and Performance for Parallel Processing, Southampton, England, 1993.
[15] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[16] F. Desprez, A. Ferreira, and B. Tourancheau, "Efficient communication operations in reconfigurable parallel computers," Technical Report CS-93-209, Univ. of Tennessee, Aug. 1993.
[17] Express 3.0 Introductory Guide. Parasoft Corporation, 1990.
[18] G. Fox,M. Johnson,G. Lyzenga,S. Otto,J. Salmon,, and D. Walker,Solving Problems on Concurrent Processors, Vol. I: General Techniques andRegular Problems.Englewood Cliffs, N.J.: Prentice Hall 1988.
[19] P. Fraigniaud and E. Lazard, "Methods and problems of communication in usual networks," Technical Report 91-33, IMAG, Ecole Normale Supérieure de Lyon, France, Oct. 1991.
[20] G.A. Geist, M.T. Heath, B.W. Peyton, and P.H. Worley, "A user's guide to PICL: A portable instrumented communication library," ORNL Technical Report, ORNL/TM-11616, Oct. 1990.
[21] S.M. Hedetniemi, S.T. Hedetniemi, and A.L. Liestman, "A survey of gossiping and broadcasting in communication networks," Networks, vol. 18, no. 4, pp. 319-349, 1988.
[22] R. Hempel, "The ANL/GMD macros (PARMACS) in FORTRAN for portable parallel programming using the message passing programming model, user's guide and reference manual," Technical Memorandum, Gesellschaft für Mathematik und Datenverabeitung mbH, West Germany.
[23] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[24] R. Karp,A. Sahay,E. Santos,, and K.E. Schauser,“Optimal broadcast and summation in the LogP model,” Proc. Fifth Ann. Symp. Parallel Algorithms and Architectures, ACM, June 1993.
[25] J. Dongarra et al.,“Document for a standard message-passing interface,” Message Passing Interface Forum, Univ. of Tennessee, Tech. Report CS-93-214, Nov. 1993.
[26] P.K. McKinley, H. Xu, A. Esfahanian, and L. Ni, "Unicast-based multicast communication in wormhole-routed networks, Proc. 1992 Int'l Conf. Parallel Processing, vol. II, pp. 10-19, Aug. 1992.
[27] A. Skjellum and A.P. Leung, "Zipcode: A portable multicomputer communication library atop the Reactive Kernel," Proc. Fifth Distributed Memory Computing Conf., IEEE, pp. 328-337, Apr. 1990.
[28] Q.F. Stout and B. Wagar,“Intensive hypercube communication: Prearranged communication inlink-bound machines,” J. Parallel and Distributed Computing, vol. 10, pp. 167-181, 1990.
[29] R.A. van de Geijn, "Efficient global combine operations," Sixth Distributed Memory Computing Conf., IEEE, Apr. 1991.

Index Terms:
Broadcast, global combine, postal model, complete graph, collective communication.
Citation:
Jehoshua Bruck, Luc De Coster, Natalie Dewulf, Ching-Tien Ho, Rudy Lauwereins, "On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 3, pp. 256-265, March 1996, doi:10.1109/71.491579
Usage of this product signifies your acceptance of the Terms of Use.