This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
August 1997 (vol. 8 no. 8)
pp. 825-839

Abstract—Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented.

[1] J. Anderson and M. Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines," Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 112-125,Albuquerque, N.M., June 1993.
[2] V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, “A Static Performance Estimator to Guide Data Partitioning Decisions,” Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Apr. 1991.
[3] D. Callahan and K. Kennedy, "Compiling Programs for Distributed-Memory Multiprocessors," J. Supercomputing, vol. 2, pp. 151-169, 1988.
[4] B. Chapman, T. Fahringer, and H. Zima, “Automatic Support for Data Distribution,” Proc. Sixth Int'l Workshop Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., pp. 184–199, Aug. 1993.
[5] B. Chapman,P. Mehrotra,H. Moritsch,, and H. Zima,“Dynamic data distributions in Vienna Fortran,” Proc. of Supercomputing’93, pp. 284-293, Nov. 1993.
[6] S. Chatterjee, J.R. Gilbert, and R. Schreiber, "Mobile and Replicated Alignment of Arrays in Data-Parallel Programs," Proc. Supercomputing '93, Nov. 1993.
[7] S. Chatterjee, J.R. Gilbert, R. Schreiber, and S.H. Teng, "Automatic Array Alignment in Data-Parallel Programs," Proc. ACM SIGACT/ SIGPLAN Symp. Principles of Programming Languages,Charleston, S.C., Jan. 1993.
[8] T-S. Chen and J-P. Sheu, "Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 9, Sept. 1994, pp. 924-938.
[9] G. Fox,M. Johnson,G. Lyzenga,S. Otto,J. Salmon,, and D. Walker,Solving Problems on Concurrent Processors, Vol. I: General Techniques andRegular Problems.Englewood Cliffs, N.J.: Prentice Hall 1988.
[10] C. Gong, R. Gupta, and R. Melhem, "Compilation Techniques for Optimizing Communication on Distributed-Memory Systems," Proc. Int'l Conf. Parallel Processing, pp. II-39-46,St. Charles, Ill., Aug. 1993.
[11] M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 179-193, Mar. 1992.
[12] S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[13] S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiling Fortran D for MIMD Distributed-Memory Machines," Comm. ACM, vol. 35, no. 8, pp. 66-80, Aug. 1992.
[14] P.D. Hovland and L.M. Ni, "A Model for Automatic Data Partitioning," Proc. Int'l Conf. Parallel Processing, pp. II-251-259,St. Charles, Ill., Aug. 1993.
[15] C.-H. Huang and P. Sadayappan, “Communication-Free Partitioning of Nested Loops,” J. Parallel and Distributed Computing, vol. 19, pp. 90-102, 1993.
[16] D.E. Hudak and S.G. Abraham, Compiling Parallel Loops for High Performance Computers.Norwell, Mass.: Kluwer Academic, 1993.
[17] E.T. Kalns and L.M. Ni, "Processor Mapping Techniques toward Efficient Data Redistribution," Technical Report MSU-CPS-ACS-86, Dept. of Computer Science, Michigan State Univ., Jan. 1994.
[18] E.T. Kalns, H. Xu, and L.M. Ni, "Evaluation of Data Distribution Patterns in Distributed-Memory Machines," Proc. Int'l Conf. Parallel Processing, pp. II-175-183,St. Charles, Ill., Aug. 1993.
[19] K. Kennedy, N. Nedeljkovic, and A. Sethi, "Communication Generation for Cyclic(k) Distributions," Proc. Third Workshop Languages, Compilers, and Runtime Systems for Scalable Computers, pp. 185-197,Troy, N.Y., May 1995.
[20] K. Knobe, J.D. Lukas, and G.L. Steele Jr., "Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines," J. Parallel and Distributed Computing, vol. 8, no. 2, pp. 102-118, Feb. 1990.
[21] K. Knobe and V. Natarajan, "Automatic Data Allocation to Minimize Communication on SIMD Machines," J. Supercomputing, vol. 7, pp. 387-415, 1993.
[22] C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[23] U. Kremer, "NP-Completeness of Dynamic Remapping," Proc. Fourth Workshop Compilers for Parallel Computers,Delft, The Netherlands, Dec. 1993.
[24] U. Kremer, "Automatic Data Layout Using 0-1 Integer Programming," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques,Montréal, Canada, Aug. 1994.
[25] U. Kremer, J. Mellor-Crummey, K. Kennedy, and A. Carle, "Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment," Automatic Parallelization—New Approaches to Code Generation, Data Distribution, and Performance Prediction, pp. 136-152, Vieweg Advanced Studies in Computer Science. Wiesbaden, Germany: Verlag Vieweg, 1993.
[26] P.-Z. Lee, "Efficient Algorithms for Data Distribution on Distributed Memory Multicomputers," Proc. Int'l Conf. Parallel and Distributed Systems, pp. 573-579,Hsin-Chu, Taiwan, Dec. 1994.
[27] P.-Z. Lee, "Techniques for Compiling Programs on Distributed Memory Multicomputers," Parallel Computing, vol. 21, no. 12, pp. 1,895-1,923, 1995.
[28] P.-Z. Lee and W.Y. Chen, “Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Multicomputers,” Proc. 29th IEEE Hawaii Int'l Conf. System Sciences, pp. 537-546, Jan. 1996.
[29] P.-Z. Lee and W.Y. Chen, "Generating Global Name-Space Communication Sets for Doall Statements," Submitted for publication, available via WWW http://www.iis.sinica.edu.tw/~leepe/PAPER/comm97.ps.
[30] P.-Z. Lee and T.B. Tsai, "Compiling Efficient Programs for Tightly-Coupled Distributed Memory Computers," Proc. Int'l Conf. Parallel Processing, pp. II-161-165,St. Charles, Ill., Aug. 1993.
[31] J. Li and M. Chen, “Compiling Communication Efficient Programs for Massively Parallel Machines,” J. Parallel and Distributed Computers, vol. 2, no. 3, pp. 361-376, 1991.
[32] J. Li and M. Chen, "The Data Alignment Phase in Compiling Programs for Distributed-Memory Machines," J. Parallel and Distributed Computing, vol. 13, pp. 213-221, 1991.
[33] M. Mace, Memory Storage Patterns in Parallel Processing.Boston: Kluwer Academic, 1987.
[34] P. Mehrotra and J. Van Rosendale, "Programming Distributed Memory Architectures Using Kali," Advances in Languages and Compilers for Parallel Computing, A. Nicolau, D. Gelernter, T. Gross, and D. Padua, eds., pp. 364-384. Pitman/MIT-Press, 1991.
[35] J. Ramanujam and P. Sadayappan, “Compile-Time Techniques for Data Distribution in Distributed Memory Machines,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 472-482, Oct. 1991.
[36] J. Ramanujam and P. Sadayappan, "Tiling Multidimensional Iteration Spaces for Multicomputers," J. Parallel and Distributed Computing, vol. 16, pp. 108-120, 1992.
[37] J.C. Strikwerda, Finite Difference Schemes and Partial Differential Equations, chapter 7.3, "The Alternating Direction Implicit (ADI) Method," pp. 142-153.Pacific Grove, Calif.: Wadsworth&Brooks/Cole Advanced Books&Software, 1989.
[38] P.-S. Tseng, A Systolic Array Parallelizing Compiler. Boston: Kluwer Academic, 1990.
[39] S. Wholey, "Automatic Data Mapping for Distributed-Memory Parallel Computers," Proc. Int'l Conf. Supercomputing, July 1992.
[40] M. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 30-44, June 1991.
[41] M. Wolf and M. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, Oct. 1991.
[42] H.P. Zima, H-J. Bast, and M. Gerndt, "SUPERB: A Tool for Semi-Automatic MIMD/SIMD Parallelization," Parallel Computing, vol. 6, pp. 1-18, 1988.

Index Terms:
Component alignment, data distribution, distributed memory computer, Do-loops, dynamic programming algorithm for data distribution, parallelizing compiler.
Citation:
PeiZong Lee, "Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 8, pp. 825-839, Aug. 1997, doi:10.1109/71.605769
Usage of this product signifies your acceptance of the Terms of Use.