This Article 
 Bibliographic References 
 Add to: 
Efficient Algorithms for the Reduce-Scatter Operation in LogGP
September 1997 (vol. 8 no. 9)
pp. 970-982

Abstract—We consider the problem of efficiently performing a reduce-scatter operation in a message passing system. Reduce-scatter is the composition of an element-wise reduction on vectors of n elements initially held by n processors, with a scatter of the resulting vector among the processors. In this paper, we present two algorithms for the reduce-scatter operation, designed in LogGP. The first algorithm assumes an associative and commutative reduction operator and it is optimal in LogGP within a small constant factor. The second algorithm allows the reduction operator to be noncommutative, and it is asymptotically optimal when values to be combined are large arrays. To achieve these results, we developed a complete analysis of both algorithms in LogGP, including the derivation of lower bounds for the reduce-scatter operation, and the study of the m-item version of the problem, i.e., the case when the initial elements are vectors themselves. Reduce-scatter has been included as a collective operation in the MPI standard message passing library, and can be used, for instance, in parallel matrix-vector multiply when the matrix is decomposed by columns. To model a message passing system, we adopted the LogGP model, an extension of LogP that allows the modeling of messages of different length. While this choice makes the analysis somewhat more complex, it leads to more realistic results in the case of gather/scatter algorithms.

[1] A. Alexandrov, M. Ionescu, K.E. Schauser, and C. Scheiman, “LogGP: Incorporating Long Messages into the LogP Model,” Proc. Symp. Parallel Algorithms and Architectures '95, July 1995.
[2] A. Bar-Noy and S. Kipnis,“Broadcasting multiple messages in simultaneous send/receivesystems, Fifth Symp. Parallel and Distributed, Processing, IEEE, pp. 344-347, Dec. 1993.
[3] A. Bar-Noy and S. Kipnis,“Designing broadcasting algorithms in the postal model formessage-passing systems,” Math. Systems Theory, vol. 27, no. 5, pp. 431-452, 1994.
[4] A. Bar-Noy,S. Kipnis,, and B. Schieber,“Optimal computation of census functions in the postal model,” to appear in Discrete Applied Math.
[5] M. Bernaschi, G. Iannello, and M. Lauria, "Efficient Implementation of Reduce-Scatter in MPI," technical report, GRID Group, Univ. of Napoli, 1997.
[6] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[7] D. Culler et al., "Assessing Fast Network Interfaces," IEEE Micro, Feb. 1996, pp. 35-43.
[8] J. Dongarra et al.,“Document for a standard message-passing interface,” Message Passing Interface Forum, Univ. of Tennessee, Tech. Report CS-93-214, Nov. 1993.
[9] R. Karp,A. Sahay,E. Santos,, and K.E. Schauser,“Optimal broadcast and summation in the LogP model,” Proc. Fifth Ann. Symp. Parallel Algorithms and Architectures, ACM, June 1993.
[10] V. Sunderam, J. Dongarra, A. Geist, and R Manchek, “The PVM Concurrent Computing System: Evolution, Experiences, and Trends,” Parallel Computing, vol. 20, no. 4, pp. 531–547, , 1994.

Index Terms:
Reduce-scatter, algorithm analysis, parallel algorithm, collective communication operations, LogP, LogGP, postal model, generalized Fibonacci numbers, MPI.
Giulio Iannello, "Efficient Algorithms for the Reduce-Scatter Operation in LogGP," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 9, pp. 970-982, Sept. 1997, doi:10.1109/71.615442
Usage of this product signifies your acceptance of the Terms of Use.