This Article 
 Bibliographic References 
 Add to: 
A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors
April 2001 (vol. 12 no. 4)
pp. 416-431

Abstract—Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems. However, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramatically improve the efficiency of the resulting program. In this paper, we present a framework for automatic data mapping in the context of distributed memory multiprocessor systems. The framework is based on a new approach that allows the alignment, distribution, and redistribution problems to be solved together using a single graph representation. The Communication Parallelism Graph (CPG) is the structure that holds symbolic information about the potential data movement and parallelism inherent to the whole program. The CPG is then particularized for a given problem size and target system and used to find a minimal cost path through the graph using a general purpose linear 0-1 integer programming solver. The data layout strategy generated is optimal according to our current cost and compilation models.

[1] J. Anderson, “Automatic Computation and Data Decomposition for Multiprocessors,” doctoral thesis, Stanford Univ., San Francisco, Calif., Mar. 1997.
[2] J. Anderson and M. Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines," Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 112-125,Albuquerque, N.M., June 1993.
[3] E. Ayguade, J. Garcia, M. Girones, M.L. Grande, and J. Labarta, “A Research Tool for Automatic Data Distribution in HPF,” Scientific Programming, vol. 6, no. 1, pp. 73-95, 1997.
[4] E. Ayguadé, J. Garcia, M.L. Grande, and J. Labarta, “Data Distribution and Loop Parallelization for Shared-Memory Multiprocessors,” Proc. Ninth Ann. Workshop Languages and Compilers for Parallel Computing, pp. 41-55, Aug. 1996.
[5] E. Ayguadé, J. Garcia, and U. Kremer, “Tools and Techniques for Automatic Data Layout: A Case Study,” Parallel Computing, vol. 24, nos. 3-4, pp. 557-578, 1998.
[6] V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, “A Static Performance Estimator to Guide Data Partitioning Decisions,” Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Apr. 1991.
[7] V. Boudet and F. Rastello, “Alignment and Distribution is not (Always) NP-Hard,” Proc. Int'l Conf. Parallel and Distributed Systems, 1998.
[8] S. Chatterjee, J.R. Gilbert, and R. Schreiber, “The Alignment-Distribution Graph,” Proc. Sixth Ann. Workshop Languages and Compilers for Parallel Computing, Aug. 1993.
[9] S. Chatterjee, J.R. Gilbert, R. Schreiber, and T.J. Sheffler, “Array Distribution in Data-Parallel Programs,” Proc. Seventh Ann. Workshop Languages and Compilers for Parallel Computing, Aug. 1994.
[10] D. Chavarría-Miranda and J. Mellor-Crummey, “Towards Compiler Support for Scalable Parallelism Using Multipartitioning,” Proc. Fifth Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers, May 2000.
[11] M. Damian-Iordache and S.V. Pemmaraju, “Automatic Data Decomposition for Message-Passing Machines,” Proc. 10th Ann. Workshop Languages and Compilers for Parallel Computing, Aug. 1997.
[12] J. Garcia, “Automatic Data Distribution for Massively Parallel Processors,” doctoral thesis UPC-DAC-97-27 or UPC-CEPBA-97-08, Univ. Politècnica de Catalunya, Barcelona, Spain, Apr. 1997.
[13] J. Garcia, E. Ayguade, and J. Labarta, "Dynamic Data Distribution with Control Flow Analysis," Proc. Supercomputing'96,Pittsburgh, Penn., Nov. 1996.
[14] J. Garcia, E. Ayguadé, and J. Labarta, “A Framework for Automatic Dynamic Data Mapping,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 92-99, Oct. 1996.
[15] M. Gupta, “Automatic Data Partitioning on Distributed Memory Multicomputers,” doctoral thesis, Center for Reliable and High-Performance Computing, Univ. Illinois, Urbana-Champaign, Sept. 1992.
[16] A.H. Karp, “Programming for Parallelism,” IEEE Computer, May 1987.
[17] K. Kennedy and U. Kremer, “Automatic Data Layout for High Performance Fortran,” Proc. Supercomputing '95, Dec. 1995.
[18] C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[19] U. Kremer, “NP-Completeness of Dynamic Remapping,” Proc. Fourth Int'l Workshop Compilers for Parallel Computers, Dec. 1993.
[20] U. Kremer, “Automatic Data Layout for Distributed Memory Machines,” PhD thesis, Rice Univ., Houston, TX, Oct. 1995.
[21] U. Kremer, “Optimal and Near-Optimal Solutions for Hard Compilation Problems,” Parallel Processing Letters, vol. 7, no. 4, 1997.
[22] U. Kremer, “Fortran Red—A Retargetable Environment for Automatic Data Layout,” Proc. 11th Ann. Workshop Languages and Compilers for Parallel Computing, Aug. 1998.
[23] P. Lee, “Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 825-839, 1997.
[24] J. Li and M. Chen, “Index Domain Alignment: Minimizing Cost of Cross-Referencing Between Distributed Arrays,” Frontiers 90: Third Symp. Frontiers of Massively Parallel Computation, Oct. 1990.
[25] J. Li and M. Chen, “Compiling Communication Efficient Programs for Massively Parallel Machines,” J. Parallel and Distributed Computers, vol. 2, no. 3, pp. 361-376, 1991.
[26] G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization. New York: Wiley, 1988.
[27] D. Palermo and P. Banerjee, "Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers," Proc. Eighth Workshop Languages and Compilers for Parallel Computing,Columbus, Ohio, pp. 392-406, 1995.
[28] C.-W. Tseng, “An Optimizing Fortran D Compiler for Distributed-Memory Machines,” doctoral thesis, Center for Research on Parallel Computation, Rice Univ., Houston, TX, Jan. 1993.

Index Terms:
Automatic data mapping, alignment, distribution, redistribution, performance prediction, distributed-memory multiprocessor, loop parallelization, linear 0-1 integer programming.
Jordi Garcia, Eduard Ayguadé, Jesús Labarta, "A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 4, pp. 416-431, April 2001, doi:10.1109/71.920590
Usage of this product signifies your acceptance of the Terms of Use.