This Article 
 Bibliographic References 
 Add to: 
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
November 1997 (vol. 8 no. 11)
pp. 1143-1156

Abstract—We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth.

In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the ith block of processor j with the jth block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system.

In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred.

[1] V. Bala,J. Bruck,R. Bryant,R. Cypher,P. deJong,P. Elustondo,D. Frye,A. Ho,C.T. Ho,G. Irwin,S. Kipnis,R. Lawrence,, and M. Snir,“The IBM external user interface for scalable parallel systems,” Parallel Computing, vol. 20, no. 4, pp. 445-462, Apr. 1994.
[2] V. Bala, J. Bruck, R. Cypher, P. Elustondo, A. Ho, C.-T. Ho, S. Kipnis, and M. Snir, "CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 2, pp. 154-164, Feb. 1995.
[3] A. Bar-Noy and S. Kipnis,“Designing broadcasting algorithms in the postal model formessage-passing systems,” Math. Systems Theory, vol. 27, no. 5, pp. 431-452, 1994.
[4] L. Bhuyan and D. Agrawal, "Generalized Hypercube and Hyperbus Structures for a Computer Network," IEEE Trans. Computers, vol. 33, no. 4, pp. 323-333, Apr. 1984.
[5] S. Bokhari, "Multiphase Complete Exchange on a Circuit-Switched Hypercube," Proc. 1991 Int'l Conf. Parallel Processing, vol. I, pp. 525-528, Aug. 1991.
[6] J. Bruck, R. Cypher, L. Gravano, A. Ho, C.-T. Ho, S. Kipnis, S. Konstantinidou, M. Snir, and E. Upfal, "Survey of Routing Issues for the Vulcan Parallel Computer," IBM Research Report, RJ-8839, June 1992.
[7] J. Bruck, R. Cypher, and C.-T. Ho, "Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares," IEEE Trans. Computers, vol. 42, no. 9, pp. 1,089-1,104, Sept. 1993.
[8] C.Y. Chu, "Comparison of Two-dimensional FFT Methods on the Hypercubes," Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 1,430-1,437, 1988.
[9] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[10] W.J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler, "The J-Machine: a Fine-Grain Concurrent Computer," Proc. Information Processing '89, pp. 1,147-1,153, 1989.
[11] B. Elspas and J. Turner, "Graphs with Circulant Adjacency Matrices," J. Combinatorial Theory, no. 9, pp. 297-307, 1970.
[12] G. Fox,M. Johnson,G. Lyzenga,S. Otto,J. Salmon,, and D. Walker,Solving Problems on Concurrent Processors, Vol. I: General Techniques andRegular Problems.Englewood Cliffs, N.J.: Prentice Hall 1988.
[13] P. Fraigniaud and E. Lazard, "Methods and Problems of Communication in Usual Networks," Discrete Applied Math., vol. 53, pp. 79-133, 1994.
[14] G.A. Geist, M.T. Heath, B.W. Peyton, and P.H. Worley, "A User's Guide to PICL: A Portable Instrumented Communication Library," ORNL Technical Report no. ORNL/TM-11616, Oct. 1990.
[15] G.A. Geist and V.S. Sunderam, "Network Based Concurrent Computing on the PVM System," ORNL Technical Report no. ORNL/TM-11760, June 1991.
[16] S.M. Hedetniemi, S.T. Hedetniemi, and A.L. Liestman, "A Survey of Gossiping and Broadcasting in Communication Networks," Networks, vol. 18, pp. 319-349, 1988.
[17] R. Hempel, "The ANL/GMD Macros (PARMACS) in FORTRAN for Portable Parallel Programming Using the Message Passing Programming Model, User's Guide and Reference Manual," technical memorandum, Gesellschaft füMathematik und Datenverabeitung mbH, West Germany.
[18] C.-T. Ho and M.T. Raghunath, "Efficient Communication Primitives on Hypercubes," Concurrency: Practice and Experience, vol. 4, no. 6, pp. 427-458, Sept. 1992.
[19] S.L Johnsson and C.-T. Ho, "Matrix Multiplication on Boolean Cubes Using Generic Communication Primitives," Parallel Processing and Medium-Scale Multiprocessors, A. Wouk, ed., pp. 108-156. SIAM, 1989.
[20] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[21] S.L. Johnsson and C.-T. Ho, "Optimizing Tridiagonal Solvers for Alternating Direction Methods on Boolean Cube Multiprocessors," SIAM J. Scientific and Statistical Computing, vol. 11, no. 3, pp. 563-592, 1990.
[22] S.L. Johnsson, C.-T. Ho, M. Jacquemin, and A. Ruttenberg, "Computing Fast Fourier Transforms on Boolean Cubes and Related Networks," Advanced Algorithms and Architectures for Signal Processing II, vol. 826, pp. 223-231. Soc. Photo-Optical Instrumentation Engineers, 1987.
[23] O.A. McBryan and E.F. Van de Velde, "Hypercube Algorithms and Implementations," SIAM J. Scientific and Statistical Computing, vol. 8, no. 2, pp. 227-287, Mar. 1987.
[24] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, May 1994.
[25] J.F. Palmer, "The NCUBE Family of Parallel Supercomputers," Proc. Int'l Conf. Computer Design, 1986.
[26] F.P. Preparata and J. Vuillemin, “The Cube-Connected Cycles: A Versatile Network for Parallel Computation,” Comm ACM, vol. 24, no. 5, pp. 300-309, 1981.
[27] A. Skjellum and A.P. Leung, "Zipcode: A Portable Multicomputer Communication Library Atop the Reactive Kernel," Proc. Fifth Distributed Memory Computing Conf., pp. 328-337, Apr. 1990.
[28] P.N. Swarztrauber, "The Methods of Cyclic Reduction, Fourier Analysis, and the FACR Algorithm for the Discrete Solution of Poisson's Equation on a Rectangle," SIAM Rev., vol. 19, pp. 490-501, 1977.
[29] Connection Machine CM-5 Technical Summary. Thinking Machines Corporation, 1991.
[30] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103-111, Aug. 1990.
[31] Express 3.0 Introductory Guide. Parasoft Corporation, 1990.

Index Terms:
All-to-all broadcast, all-to-all personalized communication, complete exchange, concatenation operation, distributed-memory system, index operation, message-passing system, multiscatter/gather, parallel system.
Jehoshua Bruck, Ching-Tien Ho, Shlomo Kipnis, Eli Upfal, Derrick Weathersby, "Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 11, pp. 1143-1156, Nov. 1997, doi:10.1109/71.642949
Usage of this product signifies your acceptance of the Terms of Use.