This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
December 2000 (vol. 11 no. 12)
pp. 1201-1216

Abstract—In many scientific applications, dynamic array redistribution is usually required to enhance the performance of an algorithm. In this paper, we present a generalized basic-cycle calculation (GBCC) method to efficiently perform a BLOCK-CYCLIC(s) over P processors to BLOCK-CYCLIC(t) over Q processors array redistribution. In the GBCC method, a processor first computes the source/destination processor/data sets of array elements in the first generalized basic-cycle of the local array it owns. A generalized basic-cycle is defined as $lcm(sP,\;tQ)/(gcd(s,t)\times P)$ in the source distribution and $lcm(sP,\;tQ)/(gcd(s,t)\times Q)$ in the destination distribution. From the source/destination processor/data sets of array elements in the first generalized basic-cycle, we can construct packing/unpacking pattern tables to minimize the data-movement operations. Since each generalized basic-cycle has the same communication pattern, based on the packing/unpacking pattern tables, a processor can pack/unpack array elements efficiently. To evaluate the performance of the GBCC method, we have implemented this method on an IBM SP2 parallel machine, along with the PITFALLS method and the ScaLAPACK method. The cost models for these three methods are also presented. The experimental results show that the GBCC method outperforms the PITFALLS method and the ScaLAPACK method for all test samples. A brief description of the extension of the GBCC method to multidimensional array redistributions is also presented.

[1] S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Tseng, “Generating Local Adresses and Communication Sets for Data Parallel Programs,” J. Parallel and Distributed Computing, vol. 26,pp. 72–84, 1995.
[2] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M. Wu, “Fortran-D Language Specification,” Technical Report TR-91-170, Dept. of Computer Science, Rice Univ., Dec. 1991.
[3] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On the Generation of Efficient Data Communication for Distributed-Memory Machines,” Proc. Int'l Computing Symp., pp. 504-513, 1992.
[4] Y.-C. Chung, C.-H. Hsu, and S.-W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[5] J.J. Dongarra, R. Van De Geijn, and D.W. Walker, “A Look at Scalable Dense Linear Algebra Libraries,” Technical Report ORNL/TM-12126 from Oak Ridge Nat'l Laboratory, Apr. 1992.
[6] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[7] High Performance Fortran Forum, “High Performance Fortran Language Specification (version 1.1),” Rice Univ., Nov. 1994.
[8] S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi, “Compilation Technique for Block-Cyclic Distribution,” Proc. ACM Int'l Conf. Supercomputing, pp. 392-403, July 1994.
[9] E. Kalns and L. Ni, “Processor Mapping Techniques towards Efficient Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6,pp. 1,234–1,247, 1995.
[10] E.T. Kalns and L.M. Ni,“DaReL: A portable data redistribution library for distributed-memory machines,” Proc. 1994 Scalable Parallel Libraries Conf. 2, Oct. 1994.
[11] S.D. Kaushik, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “An Approach to Communication-Efficient Data Redistribution,” Proc. 1994 ACM Int'l Conf. Supercomputing, pp. 364-373, June 1994.
[12] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-Phase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., 1995.
[13] S.D. Kaushik, C.H. Huang, and P. Sadayappan, “Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 38, pp. 237-247, 1996.
[14] K. Kennedy, N. Nedeljkovic, and A. Sethi, “Efficient Address Generation for Block-Cyclic Distribution,” Proc. Int'l Conf. Supercomputing, pp. 180-184, July 1995.
[15] C. Koelbel, “Compiler-Time Generation of Communication for Scientific Programs,” Supercomputing '91, pp. 101-110, Nov. 1991.
[16] P.-Z. Lee and W.Y. Chen, “Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Multicomputers,” Proc. 29th IEEE Hawaii Int'l Conf. System Sciences, pp. 537-546, Jan. 1996.
[17] Y.W. Lim, P.B. Bhat, and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 74-83, 1996.
[18] Y.W. Lim, N. Park, and V.K. Prasanna, “Efficient Algorithms for Multi-Dimensional Block-Cyclic Redistribution of Arrays,” Proc. 26th Int'l Conf. Parallel Processing, pp. 234-241, 1997.
[19] L. Prylli and B. Tourancheau, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, 1997.
[20] S. Ramaswamy and P. Banerjee, "Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers," Proc. Frontiers '95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp. 342-349,McLean, Va., Feb. 1995.
[21] S. Ramaswamy, B. Simons, and P. Banerjee, “Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 217-228, Nov. 1996.
[22] J. Stichnoth,D. O’Hallaron,, and T. Gross,“Generating communication for array statements: Design, implementation, and evaluation,” J. of Parallel and Distributed Computing, vol. 21, no. 1, pp. 150-159, 1994.
[23] R. Thakur,A. Choudhary,, and G. Fox,“Runtime array redistribution in HPF programs,” Proc. 1994 Scalable High Performance Computing Conf., pp. 309-316, May 1994.
[24] R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Array Redistribution“ IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6 pp. 587-594, June 1996.
[25] A. Thirumalai and J. Ramanujam, “HPF Array Statements: Communication Generation and Optimization,” Proc. Third Workshop Languages, Compilers and Run-time system for Scalable Computers, May 1995.
[26] A. Thirumalai and J. Ramanujam, “Efficient Computation of Address Sequences in Data-Parallel Programs Using Closed Forms for Basis Vectors,” J. Parallel and Distributed Computing, vol, 38, no. 2, pp. 188-203, Nov. 1996.
[27] V. Van Dongen, C. Bonello, and C. Freehill, “High Performance C–Language Specification Version 0.8.9,” Technical Report CRIM-EPPP-94/04-12, 1994.
[28] C. Van Loan, “Computational Frameworks for the Fast Fourier Transform,” SIAM, 1992.
[29] D.W. Walker and S.W. Otto, “Redistribution ofBLOCK-CYCLICData Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, Nov. 1996.
[30] A. Wakatani and M. Wolfe, “A New Approach to Array Redistribution: Strip Mining Redistribution,” Proc. Parallel Architectures and Languages Europe, July 1994.
[31] A. Wakatani and M. Wolfe, “Optimization of Array Redistribution for Distributed Memory Multicomputers,” Parallel Computing, vol. 21, no. 9, pp. 1485-1490, Sept. 1995.

Index Terms:
Redistribution, generalized basic-cycle calculation method, distributed memory multicomputers.
Citation:
Ching-Hsien Hsu, Sheng-Wen Bai, Yeh-Ching Chung, Chu-Sing Yang, "A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 12, pp. 1201-1216, Dec. 2000, doi:10.1109/71.895789
Usage of this product signifies your acceptance of the Terms of Use.