This Article 
 Bibliographic References 
 Add to: 
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
December 1999 (vol. 10 no. 12)
pp. 1217-1240

Abstract—Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on $P$ processors to cyclic(Kx) on $Q$ processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes $O(max(P,Q))$ time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation.

[1] L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1997.
[2] J. Bruck,C.T. Ho,S. Kipnis,, and D. Weathersby,“Efficient algorithms for all-to-all communications in multiportmessage-passing systems,” Sixth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 298-309, June 1994.
[3] J. Choi, J. Dongarra, and D. Walker, “Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers,” Proc. Fourth Symp. Frontiers of Massively Parallel Computation, 1993.
[4] Y.-C. Chung, C.-H. Hsu, and S.-W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[5] F. Desprez, J. Dongarra, and A. Petitet, C. Randriamaro, Y. Robert, “Scheduling Block-Cyclic Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2,pp. 192–205 1998.
[6] E.A. Dinic, “Algorithm for Solution of Maximum Flow in a Network with Power Estimation,” Soviet Math Doklady, vol. 11, pp. 1,277-1,280, 1970.
[7] S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi, “Compilation Technique for Block-Cyclic Distribution,” Proc. ACM Int'l Conf. Supercomputing, pp. 392-403, July 1994.
[8] E.T. Kalns and L.M. Ni,“Processor mapping techniques toward efficient data redistribution,” Proc. of the Eighth Int’l Parallel Processing Symp., pp. 469-476, Apr. 1994.
[9] S.D. Kaushik, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “An Approach to Communication-Efficient Data Redistribution,” Proc. 1994 ACM Int'l Conf. Supercomputing, pp. 364-373, June 1994.
[10] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-Phase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., 1995.
[11] C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[12] Y.W. Lim, P.B. Bhat, and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 74-83, 1996.
[13] Y.W. Lim and V.K. Prasanna, “Scalable Portable Implementations of Space-Time Adaptive Processing,” Proc. 10th Int'l. Conf. High Performance Computing, 1996.
[14] W. Liu, W. Kostis, and V.K. Prasanna, “Communication Issues in Heterogeneous Embedded Systems,” Proc. Workshop Parallel and Distributed Real Time Systems, Apr. 1996.
[15] W. Liu and V.K. Prasanna, “Design of Application Software for Embedded Signal Processing,” IEEE Signal Processing Magazine, Sept. 1998.
[16] Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard,” Int'l J. Supercomputer Applications and High Performance Computing, vol. 8,nos. 3–4, 1994.
[17] C.H. Papadimitriu and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, 1987.
[18] L. Prylli and B. Tourancheau, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, 1997.
[19] S. Ramaswamy and P. Banerjee, “Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers,” Proc. Fifth Symp. Frontiers of Massively Parallel Computation, pp. 342–349, Feb. 1995.
[20] J.C. Setubal, “Sequential and Parallel Experimental Results with Bipartite Matching Algorithms,” Technical Report IC-96-09, Inst. Computing, State Univ. Campinas, Brazil, 1996. sc96/proceedings/http://www.cs. ~algorith/implement/bipm implement.shtml.
[21] J. Suh, M. Ung, and V.K. Prasanna, “Parallel Implementation of Synthetic Aperture Radar on High Performance Computing Platforms,” Proc. Int'l Conf. Algorithms and Architectures for Parallel Processing, Dec. 1997.
[22] R. Thakur,A. Choudhary,, and G. Fox,“Runtime array redistribution in HPF programs,” Proc. 1994 Scalable High Performance Computing Conf., pp. 309-316, May 1994.
[23] R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Array Redistribution“ IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6 pp. 587-594, June 1996.
[24] D.W. Walker and S.W. Otto, “Redistribution of Block-Cyclic Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9,pp. 707-728, Nov. 1996.
[25] C.-L. Wang, P.B. Bhat, and V.K. Prasanna, "High-Performance Computing for Vision," Proc. IEEE, vol. 84, no. 7, pp. 931-946, July 1996.

Index Terms:
Block-cyclic distribution, redistribution algorithms, interprocessor communication.
Neungsoo Park, Viktor K. Prasanna, Cauligi S. Raghavendra, "Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 12, pp. 1217-1240, Dec. 1999, doi:10.1109/71.819945
Usage of this product signifies your acceptance of the Terms of Use.