This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Generalized Processor Mapping Technique for Array Redistribution
July 2001 (vol. 12 no. 7)
pp. 743-757

Abstract—In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access in many parallel programs on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a generalized processor mapping technique to minimize the amount of data exchange for BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) array redistribution and vice versa. The main idea of the generalized processor mapping technique is first to develop mapping functions for computing a new rank of each destination processor. Based on the mapping functions, a new logical sequence of destination processors can be derived. The new logical processor sequence is then used to minimize the amount of data exchange in a redistribution. The generalized processor mapping technique can handle array redistribution with arbitrary source and destination processor sets and can be applied to multidimensional array redistribution. We present a theoretical model to analyze the performance improvement of the generalized processor mapping technique. To evaluate the performance of the proposed technique, we have implemented the generalized processor mapping technique on an IBM SP2 parallel machine. The experimental results show that the generalized processor mapping technique can provide performance improvement over a wide range of redistribution problems.

[1] S. Benkner, “Handling Block-Cyclic Distribution Arrays in Vienna Fortran 90,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, June 1995.
[2] B. Chapman,P. Mehrotra,H. Moritsch,, and H. Zima,“Dynamic data distributions in Vienna Fortran,” Proc. of Supercomputing’93, pp. 284-293, Nov. 1993.
[3] S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Tseng, “Generating Local Adresses and Communication Sets for Data Parallel Programs,” J. Parallel and Distributed Computing, vol. 26,pp. 72–84, 1995.
[4] Y.-C. Chung, C.-H. Hsu, and S.-W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[5] F. Desprez, J. Dongarra, and A. Petitet, C. Randriamaro, Y. Robert, “Scheduling Block-Cyclic Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2,pp. 192–205 1998.
[6] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M. Wu, “Fortran-D Language Specification,” Technical Report TR-91-170, Dept. of Computer Science, Rice Univ., Dec. 1991.
[7] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On the Generation of Efficient Data Communication for Distributed-Memory Machines,” Proc. Int'l Computing Symp., pp. 504-513, 1992.
[8] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[9] High Performance Fortran Forum, “High Performance Fortran Language Specification (Version 1.1),” Rice Univ., Nov. 1994.
[10] S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi, “Compilation Technique for Block-Cyclic Distribution,” Proc. ACM Int'l Conf. Supercomputing, pp. 392-403, July 1994.
[11] C.-H. Hsu and Y.-C. Chung, “Efficient Methods for$kr \rightarrow r$and$r \rightarrow kr$Array Redistribution,” J. Supercomputing, vol. 12, no. 2, pp. 253-276, May 1998.
[12] E. Kalns and L. Ni, “Processor Mapping Techniques towards Efficient Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6,pp. 1,234–1,247, 1995.
[13] E.T. Kalns and L.M. Ni,“DaReL: A portable data redistribution library for distributed-memory machines,” Proc. 1994 Scalable Parallel Libraries Conf. 2, Oct. 1994.
[14] S.D. Kaushik, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “An Approach to Communication-Efficient Data Redistribution,” Proc. 1994 ACM Int'l Conf. Supercomputing, pp. 364-373, June 1994.
[15] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-Phase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., 1995.
[16] S.D. Kaushik, C.H. Huang, and P. Sadayappan, “Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 38, pp. 237-247, 1996.
[17] K. Kennedy, N. Nedeljkovic, and A. Sethi, “Efficient Address Generation for Block-Cyclic Distribution,” Proc. Int'l Conf. Supercomputing, pp. 180-184, July 1995.
[18] P.-Z. Lee and W.Y. Chen, “Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Multicomputers,” Proc. 29th IEEE Hawaii Int'l Conf. System Sciences, pp. 537-546, Jan. 1996.
[19] Y.W. Lim, P.B. Bhat, and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 74-83, 1996.
[20] Y.W. Lim, N. Park, and V.K. Prasanna, “Efficient Algorithms for Multi-Dimensional Block-Cyclic Redistribution of Arrays,” Proc. 26th Int'l Conf. Parallel Processing, pp. 234-241, 1997.
[21] L. Prylli and B. Tourancheau, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, 1997.
[22] S. Ramaswamy and P. Banerjee, "Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers," Proc. Frontiers '95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp. 342-349,McLean, Va., Feb. 1995.
[23] S. Ramaswamy, B. Simons, and P. Banerjee, “Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 217-228, Nov. 1996.
[24] J. Stichnoth,D. O’Hallaron,, and T. Gross,“Generating communication for array statements: Design, implementation, and evaluation,” J. of Parallel and Distributed Computing, vol. 21, no. 1, pp. 150-159, 1994.
[25] R. Thakur,A. Choudhary,, and G. Fox,“Runtime array redistribution in HPF programs,” Proc. 1994 Scalable High Performance Computing Conf., pp. 309-316, May 1994.
[26] R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Array Redistribution“ IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6 pp. 587-594, June 1996.
[27] A. Thirumalai and J. Ramanujam, “HPF Array Statements: Communication Generation and Optimization,” Proc. Third Workshop on Languages, Compilers and Run-Time System for Scalable Computers, May 1995.
[28] V. Van Dongen, C. Bonello, and C. Freehill, “High Performance C—Language Specification Version 0.8.9,” Technical Report CRIM-EPPP-94/04-12, 1994.
[29] C. Van Loan, “Computational Frameworks for the Fast Fourier Transform,” SIAM, 1992.
[30] D.W. Walker and S.W. Otto, “Redistribution of BLOCK-CYCLIC Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, Nov. 1996.
[31] A. Wakatani and M. Wolfe, “A New Approach to Array Redistribution: Strip Mining Redistribution,” Proc. Parallel Architectures and Languages Europe, July 1994.
[32] A. Wakatani and M. Wolfe, “Optimization of Array Redistribution for Distributed Memory Multicomputers,” Parallel Computing, vol. 21, no. 9, pp. 1485-1490, Sept. 1995.
[33] H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald, “Vienna Fortran—A Language Specification Version 1.1,” ICASE Interim Report 21, ICASE NASA Langley Research Center, Hampton, Va. 23665, Mar. 1992.

Index Terms:
Array redistribution, generalized processor mapping, distributed memory multicomputers, runtime support.
Citation:
Ching-Hsien Hsu, Yeh-Ching Chung, Don-Lin Yang, Chyi-Ren Dow, "A Generalized Processor Mapping Technique for Array Redistribution," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 7, pp. 743-757, July 2001, doi:10.1109/71.940748
Usage of this product signifies your acceptance of the Terms of Use.