This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Unified Framework for Optimizing Communication in Data-Parallel Programs
July 1996 (vol. 7 no. 7)
pp. 689-704

Abstract—This paper presents a framework, based on global array data-flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation allows us to apply techniques for partial redundancy elimination to obtain powerful communication optimizations. With a single framework, we are able to capture optimizations like 1) vectorizing communication, 2) eliminating communication that is redundant on any control flow path, 3) reducing the amount of data being communicated, 4) reducing the number of processors to which data must be communicated, and (5) moving communication earlier to hide latency, and to subsume previous communication. We show that the bidirectional problem of eliminating partial redundancies can be decomposed into simpler unidirectional problems even in the context of an array section representation, which makes the analysis procedure more efficient. We present results from a preliminary implementation of this framework, which are extremely encouraging, and demonstrate the effectiveness of this analysis in improving the performance of programs.

[1] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[2] F.E. Allen and J. Cocke,“A program data flow analysis procedure,” Comm. ACM, vol. 19, no. 3, pp. 137-147, 1976.
[3] S.P. Amarasinghe and M.S. Lam, “Communication Optimization and Code Generation for Distributed Memory Machines,” Proc. ACM SIGPLAN Programming Language Design and Implementation, pp. 126-138, June 1993.
[4] V. Balasundaram, “A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor,” J. Parallel and Distributed Computing, vol. 9, pp. 154-170, 1990.
[5] M. Burke, "An Interval-Based Approach to Exhaustive and Incremental Interprocedural Data-Flow Analysis," ACM Trans. Programming Languages and Systems, vol. 12, no. 3, pp. 341-395, July 1990.
[6] D. Callahan and K. Kennedy, “Analysis of Interprocedural Side Effects in a Parallel Programming Environment,” Proc. First ACM Int'l Conf. Supercomputing, pp. 138-171, May 1987.
[7] S. Chatterjee, J.R. Gilbert, R. Schreiber, and S.H. Teng, "Optimal Evaluation of Array Expressions on Massively Parallel Machines," Proc. Second Workshop Languages, Compilers, and Runtime Environments for Distributed Memory Multiprocessors,Boulder, CO., Oct. 1992.
[8] F. Chow,A Portable Machine-Independent Global Optimizer, PhD dissertation and Technical Report No. 83-254,Computer Systems Laboratory, Stanford Univ., Dec. 1983.
[9] D.M. Dhamdhere, B.K. Rosen, and F.K. Zadeck, "How to Analyze Large Programs Efficiently and Informatively," Proc. ACM SIGPLAN '92 Conf. Programming Language Design and Implementation,San Francisco, Ca., June 1992.
[10] High Performance Fortran Forum, High Performance Fortran Language Specification, Version 1.0, Technical Report CRPC-TR92225, Rice Univ., May 1993.
[11] C. Gong, R. Gupta, and R. Melhem, "Compilation Techniques for Optimizing Communication in Destributed-Memory Systems," Proc. 1993 Int'l Conf. On Parallel Processing,St. Charles, IL., Aug. 1993.
[12] E. Granston and A. Veidenbaum, "Detecting Redundant Acccesses To Array Data," Proc. Supercomputing '91, pp. 854-965, 1991.
[13] T. Gross and P. Steenkiste, “Structured Data-Flow Analysis for Arrays and Its Use in an Optimizing Compiler,” Software Practice and Experience, vol. 20, no. 2, pp. 133–155, Feb. 1990.
[14] M. Gupta and P. Banerjee, "A Methodology for High-Level Synthesis of Communication on Multicomputers," Proc. Sixth ACM Int'l Conf. Supercomputing,Washington D.C., July 1992.
[15] M. Gupta et al., "An HPF Compiler for the IBM SP2," Proc. Supercomputing '95 (CD-ROM), IEEE Computer Society Press, Los Alamitos, Calif., 1995.
[16] M. Gupta and E. Schonberg, "A Framework for Exploiting Data Availability to Optimize Communication," Proc. Sixth Workshop on Languages and Compilers for Parallel Computing,Portland, OR., Aug. 1993.
[17] P. Havlak and K. Kennedy, "An Implementation of Interprocedural Bounded Regular Section Analysis," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 3, pp. 350-360, July 1991.
[18] S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiling Fortran D for MIMD Distributed-Memory Machines," Comm. ACM, vol. 35, no. 8, pp. 66-80, Aug. 1992.
[19] J. Knoop, O. Rüthing, and B. Steffen, "Lazy Code Motion," Proc. ACM SIGPLAN '92 Conf. Program Language Design and Implementation,San Francisco, June 1992.
[20] S.M. Joshi and D.M. Dhamdhere, "A Composite Hoisting-Strength Reduction Transformation for Global Program Optimization (parts 1 and 2)," Int'l J. Computer Mathematics, pp. 22-41 and 111-126, 1992.
[21] C. Koelbel,“Compiling Programs for Nonshared Memory Machines,” PhD thesis, Purdue Univ., West Lafayette, Ind., Aug. 1990.
[22] C. Koelbel and P. Mehrotra, "Compiling Global Name-Space Parallel Loops for Distributed Execution," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 10, pp. 440-451, Oct. 1991.
[23] J. Li and M. Chen, “Compiling Communication Efficient Programs for Massively Parallel Machines,” J. Parallel and Distributed Computers, vol. 2, no. 3, pp. 361-376, 1991.
[24] E. Morel and C. Renvoise, "Global Optimization by Suppression of Partial Redundancies," Comm. ACM, vol. 22, no. 2, pp. 96-103, Feb. 1979.
[25] M.J. Quinn and P.J. Hatcher Data-Parallel Programming on Multicomputers, IEEE Software, vol. 7, pp. 69-76, Sept. 1990.
[26] A. Rogers, and K. Pingali,“Process decomposition through locality of reference,” Proc. SIGPLAN’89 Conf. Program Language Design and Implementation,Portland, Ore., June 1989.
[27] C. Rosene, “Incremental Dependence Analysis,” Technical report CRPC-TR90044, PhD thesis, Computer Science Dept., Rice Univ., Mar. 1990.
[28] E. Su, D.J. Palermo, and P. Banerjee, "Automating Parallelization of Regular Computations for Distributed Memory Multicomputers in the PARADIGM Compiler," Proc. 1993 Int'l Conf. Parallel Processing,St. Charles, IL., Aug. 1993.
[29] R.E. Tarjan, "Testing Flow Graph Reducibility," J Computer and System Sciences, vol. 9, no. 3, pp. 355-365, Dec. 1974.
[30] R. von Hanxleden and K. Kennedy, "Give-N-Take—A Balanced Code Placement Framework," Proc. SIGPLAN '94 Conf. Programming Language Design and Implementation, pp. 107-120. ACM Press, June 1994.
[31] R. von Hanxleden,K. Kennedy, C. Koelbel, R. Das, and J. Saltz, “Compiler Analysis for Irregular Problems in Fortran D,” Proc. Fifth Workshop Languages and Compilers for Parallel Computing, Aug. 1992.
[32] M. Wolfe and U. Banerjee, "Data Dependence and Its Application to Parallel Processing," Int'l J. Parallel Programming, vol. 16, no. 2, pp. 137-178, 1987.
[33] H. Zima, H. Bast, and M. Gerndt, "SUPERB: A Tool for Semi-Automatic MIMD/SIMD Parallelization," Parallel Computing, vol. 6, pp. 1-18, 1988.

Index Terms:
Array section descriptors, communication optimizations, data availability, data-flow analysis, data-parallelism, High Performance Fortran, partial redundancy elimination.
Citation:
Manish Gupta, Edith Schonberg, Harini Srinivasan, "A Unified Framework for Optimizing Communication in Data-Parallel Programs," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 7, pp. 689-704, July 1996, doi:10.1109/71.508249
Usage of this product signifies your acceptance of the Terms of Use.