This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers
November 2006 (vol. 17 no. 11)
pp. 1226-1241
Ching-Hsien Hsu, IEEE Computer Society
Kuan-Ching Li, IEEE Computer Society

Abstract—Dynamic data redistribution is used to enhance data locality and algorithm performance by reducing interprocessor communication in many parallel scientific applications on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a processor replacement scheme to minimize the cost of interprocessor data exchange during runtime. The main idea of the proposed technique is to develop a replacement function for reordering logical processors in the destination phase. Based on the replacement function, a realigned sequence of destination processors can be derived and is then used to perform data decomposition in the receiving phase. Together with local matrix and compressed CRS vectors transposition schemes, the interprocessor communication can be eliminated during runtime. A significant improvement of this approach is that the realignment of data can be performed without interprocessor communication for special cases. The second contribution of the present technique is that the complicated communication sets generation could be simplified by applying local matrix transposition. Consequently, the indexing cost could be reduced significantly. The proposed techniques can be applied in both dense and sparse applications. A generalized symmetric redistribution algorithm is also presented in this work. To analyze the efficiency of the proposed technique, the theoretical analysis proves that up to (p-1)/p data transmission cost can be saved. For general cases, the symmetric redistribution algorithm saves 1/p communication overheads compared with the traditional method. Experimental results also show that the proposed techniques provide superior performance in most data redistribution instances.

[1] B. Chapman, P. Mehrotra, H. Moritsch, and H. Zima, “Dynamic Data Distribution in Vienna Fortran,” Proc. Supercomputing Conf. '93, pp. 284-293, Nov. 1993.
[2] S. Chatterjee, J.R. Gilbert, F.J.E. Long, R. Schreiber, and S.-H. Teng, “Generating Local Address and Communication Sets for Data Parallel Programs,” J. Parallel and Distributed Computing, vol. 26, pp. 72-84, 1995.
[3] F. Desprez, J. Dongarra, and A. Petitet, “Scheduling Block-Cyclic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, pp. 192-205, Feb. 1998.
[4] J. Dongarra, R. Van De Geijn, and D. W. Walker, “A Look at Scalable Dense Linear Algebra Libraries,” Technical Report ORNL/TM-12126 from Oak Ridge Nat'l Laboratory, Apr. 1992.
[5] J. Garcia, E. Ayguade, and J. Labarta, “A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 4, Apr. 2001.
[6] M. Guo, “Communication Generation for Irregular Codes,” J.Supercomputing, vol. 25, no. 3, pp. 199-214, 2003.
[7] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On the Generation of Efficient Data Communication for Distributed-Memory Machines,” Proc. Int'l Computing Symp., pp. 504-513, 1992.
[8] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[9] S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi, “Compilation Technique for Block-Cyclic Distribution,” Proc. ACM Int'l Conf. Supercomputing, pp. 392-403, 1994.
[10] C.-H. Hsu, S.-W. Bai, Y.-C. Chung, C.-S. Yang, “A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1201-1216, Dec. 2000.
[11] C.-H. Hsu, Y.-C. Chung, and C.-R. Dow, “Efficient Methods for Multidimensional Array Redistribution,” J. Supercomputing, vol. 17, no. 1, pp. 23-46, Aug. 2000.
[12] C.-H. Hsu, D.-L. Yang, Y.-C. Chung, and C.-R. Dow, “A Generalized Processor Mapping Technique for Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 7, pp. 743-757, July 2001.
[13] E.T. Kalns and L.M. Ni, “Processor Mapping Technique toward Efficient Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 12, Dec. 1995.
[14] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multiphase Data Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp. '95, pp. 441-445, 1995.
[15] P.-Z. Lee and W.Y. Chen, “Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Multicomputers,” Proc. 29th IEEE Hawaii Int'l Conf. System Sciences, pp.537-546, Jan. 1996.
[16] S. Lee, H. Yook, M. Koo, and M. Park, “Processor Reordering Algorithms toward Efficient GEN_BLOCK Redistribution,” Proc. 2001 ACM Symp. Applied Computing, 2001.
[17] Y.W. Lim, P.B. Bhat, and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 74-83, 1996.
[18] Y.W. Lim, N. Park, and V.K. Prasanna, “Efficient Algorithms for Multi-Dimensional Block-Cyclic Redistribution of Arrays,” Proc. 26th Int'l Conf. Parallel Processing, pp. 234-241, 1997.
[19] N. Park, V.K. Prasanna, and C.S. Raghavendra, “Efficient Algorithms for Block-Cyclic Data Redistribution between Processor Sets,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 12, pp. 1217-1240, Dec. 1999.
[20] A.P. Petitet and J.J. Dongarra, “Algorithmic Redistribution Methods for Block-Cyclic Decompositions,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 12, pp. 1201-1216, Dec. 1999.
[21] L. Prylli and B. Touranchean, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, pp. 63-72, Aug. 1997.
[22] S. Ramaswamy, B. Simons, and P. Banerjee, “Optimization for Efficient Data Redistribution on Distributed Memory Multicomputers,” J. Parallel and Distributed Computing, vol. 38, pp.217-228, 1996.
[23] J.M. Stichnoth, D. O'Hallaron, and T.R. Gross, “Generating Communication for Array Statements: Design, Implementation, and Evaluation,” J. Parallel and Distributed Computing, vol. 21, pp.150-159, 1994.
[24] R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, June 1996.
[25] A. Thirumalai and J. Ramanujam, “HPF Array Statements: Communication Generation and Optimization,” Proc. Third Workshop Languages, Compilers, and Run-Time System for Scalable Computers, May 1995.
[26] A. Wakatani and M. Wolfe, “Optimization of Array Redistribution for Distributed Memory Multicomputers,” Parallel Computing, vol. 21, no. 9, pp. 1485-1490, Sept. 1995.
[27] D.W. Walker and S.W. Otto, “Redistribution of Block-Cyclic Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, 1996.
[28] H. Wang, M. Guo, and D. Wei, “Divide-and-Conquer Algorithm for Irregular Redistributions in Parallelizing Compilers,” J. Supercomputing, vol. 29, no. 2, 2004.
[29] H. Wang, M. Guo, and W. Chen, “An Efficient Algorithm for Irregular Redistribution in Parallelizing Compilers,” Proc. 2003 Int'l Symp. Parallel and Distributed Processing with Applications (ISPA '03), 2003.

Index Terms:
Processor replacement, communication free, data redistribution, symmetric matrix, CRS transposition, sparse matrix.
Citation:
Ching-Hsien Hsu, Ming-Hao Chen, Chao-Tung Yang, Kuan-Ching Li, "Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 11, pp. 1226-1241, Nov. 2006, doi:10.1109/TPDS.2006.162
Usage of this product signifies your acceptance of the Terms of Use.