This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
April 1998 (vol. 9 no. 4)
pp. 359-377

Abstract—Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a basic-cycle calculation technique to efficiently perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution. The main idea of the basic-cycle calculation technique is, first, to develop closed forms for computing source/destination processors of some specific array elements in a basic-cycle, which is defined as lcm(s, t)/gcd(s, t). These closed forms are then used to efficiently determine the communication sets of a basic-cycle. From the source/destination processor/data sets of a basic-cycle, we can efficiently perform a BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution. To evaluate the performance of the basic-cycle calculation technique, we have implemented this technique on an IBM SP2 parallel machine, along with the PITFALLS method and the multiphase method. The cost models for these three methods are also presented. The experimental results show that the basic-cycle calculation technique outperforms the PITFALLS method and the multiphase method for most test samples.

[1] S. Benkner, “Handling Block-Cyclic Distribution Arrays in Vienna Fortran 90,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, June 1995.
[2] B. Chapman,P. Mehrotra,H. Moritsch,, and H. Zima,“Dynamic data distributions in Vienna Fortran,” Proc. of Supercomputing’93, pp. 284-293, Nov. 1993.
[3] S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Tseng, “Generating Local Adresses and Communication Sets for Data Parallel Programs,” J. Parallel and Distributed Computing, vol. 26,pp. 72–84, 1995.
[4] J.J. Dongarra, R. Van De Geijn, and D.W. Walker, “A Look at Scalable Dense Linear Algebra Libraries,” Technical Report ORNL/TM-12126 from Oak Ridge Nat'l Laboratory, Apr. 1992.
[5] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M. Wu, "Fortran-D Language Specification," Technical Report TR-91-170, Dept. of Computer Science, Rice Univ., Dec. 1991.
[6] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, "On the Generation of Efficient Data Communication for Distributed-Memory Machines," Proc. Int'l. Conf. Computing Symp., pp. 504-513,Taiwan, 1992.
[7] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[8] High Performance Fortran Forum, "High Performance Fortran Language Specification (version 1.1)," Rice Univ., Nov. 1994.
[9] S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi, “Compilation Technique for Block-Cyclic Distribution,” Proc. ACM Int'l Conf. Supercomputing, pp. 392-403, July 1994.
[10] E. Kalns and L. Ni, “Processor Mapping Techniques towards Efficient Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6,pp. 1,234–1,247, 1995.
[11] E.T. Kalns and L.M. Ni,“DaReL: A portable data redistribution library for distributed-memory machines,” Proc. 1994 Scalable Parallel Libraries Conf. 2, Oct. 1994.
[12] S.D. Kaushik, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “An Approach to Communication-Efficient Data Redistribution,” Proc. 1994 ACM Int'l Conf. Supercomputing, pp. 364-373, June 1994.
[13] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-Phase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., 1995.
[14] S.D. Kaushik, C.H. Huang, and P. Sadayappan, “Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 38, pp. 237-247, 1996.
[15] K. Kennedy, N. Nedeljkovic, and A. Sethi, “Efficient Address Generation for Block-Cyclic Distribution,” Proc. Int'l Conf. Supercomputing, pp. 180-184, July 1995.
[16] C. Koelbel, “Compiler-Time Generation of Communication for Scientific Programs,” Supercomputing '91, pp. 101-110, Nov. 1991.
[17] P.-Z. Lee and W.Y. Chen, “Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Multicomputers,” Proc. 29th IEEE Hawaii Int'l Conf. System Sciences, pp. 537-546, Jan. 1996.
[18] Y.W. Lim, P.B. Bhat, and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 74-83, 1996.
[19] Y.W. Lim, N. Park, and V.K. Prasanna, “Efficient Algorithms for Multi-Dimensional Block-Cyclic Redistribution of Arrays,” Proc. 26th Int'l Conf. Parallel Processing, pp. 234-241, 1997.
[20] L. Prylli and B. Tourancheau, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, 1997.
[21] S. Ramaswamy and P. Banerjee, "Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers," Proc. Frontiers '95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp. 342-349,McLean, Va., Feb. 1995.
[22] S. Ramaswamy, B. Simons, and P. Banerjee, “Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 217-228, Nov. 1996.
[23] J. Stichnoth,D. O’Hallaron,, and T. Gross,“Generating communication for array statements: Design, implementation, and evaluation,” J. of Parallel and Distributed Computing, vol. 21, no. 1, pp. 150-159, 1994.
[24] R. Thakur,A. Choudhary,, and G. Fox,“Runtime array redistribution in HPF programs,” Proc. 1994 Scalable High Performance Computing Conf., pp. 309-316, May 1994.
[25] R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Array Redistribution“ IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6 pp. 587-594, June 1996.
[26] A. Thirumalai and J. Ramanujam, "HPF Array Statements: Communication Generation and Optimization," Proc. Third Workshop Languages, Compilers and Run-Time System for Scalable Computers,Troy, N.Y., May 1995.
[27] A. Thirumalai and J. Ramanujam, “Efficient Computation of Address Sequences in Data-Parallel Programs Using Closed Forms for Basis Vectors,” J. Parallel and Distributed Computing, vol, 38, no. 2, pp. 188-203, Nov. 1996.
[28] V. Van Dongen, C. Bonello, and C. Freehill, "High Performance C—Language Specification Version 0.8.9," Technical Report CRIM-EPPP-94/04-12, 1994.
[29] C. Van Loan, “Computational Frameworks for the Fast Fourier Transform,” SIAM, 1992.
[30] D.W. Walker and S.W. Otto, "Redistribution of BLOCK-CYCLIC Data Distributions Using MPI," Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, Nov. 1996.
[31] A. Wakatani and M. Wolfe, “A New Approach to Array Redistribution: Strip Mining Redistribution,” Proc. Parallel Architectures and Languages Europe, July 1994.
[32] A. Wakatani and M. Wolfe, “Optimization of Array Redistribution for Distributed Memory Multicomputers,” Parallel Computing, vol. 21, no. 9, pp. 1485-1490, Sept. 1995.
[33] H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald, "Vienna Fortran—A Language Specification Version 1.1," ICASE Interim Report 21, ICASE NASA Langley Research Center, Hampton, Va., Mar. 1992.

Index Terms:
Data redistribution, the basic-cycle calculation technique, the PITFALLS method, the multiphase method, distributed memory multicomputers.
Citation:
Yeh-Ching Chung, Ching-Hsien Hsu, Sheng-Wen Bai, "A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 4, pp. 359-377, April 1998, doi:10.1109/71.667897
Usage of this product signifies your acceptance of the Terms of Use.