This Article 
 Bibliographic References 
 Add to: 
Scheduling Block-Cyclic Array Redistribution
February 1998 (vol. 9 no. 2)
pp. 192-205

Abstract—This article is devoted to the run-time redistribution of one-dimensional arrays that are distributed in a block-cyclic fashion over a processor grid. While previous studies have concentrated on efficiently generating the communication messages to be exchanged by the processors involved in the redistribution, we focus on the scheduling of those messages: how to organize the message exchanges into "structured" communication steps that minimize contention. We build upon results of Walker and Otto, who solved a particular instance of the problem, and we derive an optimal scheduling for the most general case, namely, moving from a CYCLIC(r) distribution on a P-processor grid to a CYCLIC(s) distribution on a Q-processor grid, for arbitrary values of the redistribution parameters P, Q, r, and s.

[1] C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell, "A Linear Algebra Framework for Static HPF Code Distribution," Scientific Programming, to appear. Available as CRI-Ecole des Mines Technical Report A-278-CRI, .
[2] C. Berge, Graphes et Hypergraphes. Du nod, 1970. English translation by Elsevier, Amsterdam (1985).
[3] S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Tseng, “Generating Local Adresses and Communication Sets for Data Parallel Programs,” J. Parallel and Distributed Computing, vol. 26,pp. 72–84, 1995.
[4] J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, "ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers—Design Issues and Performance," Proc. Computer Physics Comm., vol. 97, pp. 1-15, 1996. (also LAPACK Working Note #95).
[5] J. Dongarra and D. Walker, “Software Libraries for Linear Algebra Computations on High Performance Computers,” SIAM Review, vol. 37, no. 2,pp. 151–180, 1995.
[6] G.H. Golub and C.F. Van Loan, Matrix Computations, second ed. Johns Hopkins, 1989.
[7] R.L. Graham, M. Grötschel, and L. Lovász, Handbook of Combinatorics. Elsevier, 1995.
[8] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[9] E. Kalns and L. Ni, “Processor Mapping Techniques towards Efficient Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6,pp. 1,234–1,247, 1995.
[10] K. Kennedy, N. Nedeljkovic, and A. Sethi, “Efficient Address Generation for Block-Cyclic Distribution,” Proc. Int'l Conf. Supercomputing, pp. 180-184, July 1995.
[11] K. Kennedy, N. ${\bf Nedeljkovi\acute c}$, and A. Sethi, “A Linear-Time Algorithm for Computing the Memory Access Sequence in Data Parallel Programs,” Proc. Fifth ACM SIGPLAN, Symp. Principles and Practice of Parallel Programming, 1995.
[12] C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[13] A. Petitet, Algorithmic Redistribution Methods for Block Cyclic Decompositions, doctoral thesis, Univ. Tennessee, K noxville, 1996.
[14] L. Prylli and B. Tourancheau, "Efficient Block-Cyclic Data Redistribution," Proc. EuroPar'96, Lectures Notes in Computer Science, vol. 1,123, pp. 155-164. Springer Verlag, 1996.
[15] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, “MPI: The Complete Reference,” MIT Press,, 1995.
[16] J. Stichnoth,D. O’Hallaron,, and T. Gross,“Generating communication for array statements: Design, implementation, and evaluation,” J. of Parallel and Distributed Computing, vol. 21, no. 1, pp. 150-159, 1994.
[17] A. Thirumalai and J. Ramanujam, “Fast Address Sequence Generation for Data Parallel Programs Using Integer Lattices,” Languages and Compilers for Parallel Computing: Lecture Notes in Computer Science. P. Sadayappan et al., eds., Springer-Verlag, 1996.
[18] K. van Reeuwijk, W. Denissen, H.J. Sips, and E.M.R.M. Paalvast, "An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 9, pp. 897-914, Sept. 1996.
[19] A. Wakatani and M. Wolfe, “Optimization of Array Redistribution for Distributed Memory Multicomputers,” Parallel Computing, vol. 21, no. 9, pp. 1485-1490, Sept. 1995.
[20] D.W. Walker and S.W. Otto, "Redistribution of Block-Cyclic Data Distributions Using MPI," Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, 1996.
[21] L. Wang, J. Stichnoth, S. Chatterjee, “Runtime Performance of Par-allel Array Assignment: An Empirical Study,” Proc. Supercomputing, 1996. ().

Index Terms:
Distributed arrays, redistribution, block-cyclic distribution, scheduling, MPI, HPF.
Frédéric Desprez, Jack Dongarra, Antoine Petitet, Cyril Randriamaro, Yves Robert, "Scheduling Block-Cyclic Array Redistribution," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 2, pp. 192-205, Feb. 1998, doi:10.1109/71.663945
Usage of this product signifies your acceptance of the Terms of Use.