This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Compiling for Distributed Memory Architectures
March 1994 (vol. 5 no. 3)
pp. 281-298

The lack of high-level languages and good compilers for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balancing. We have developed a parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference. A process decomposition is obtained by specializing the program for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution. Each process's role in the computation is determined by examining the data required for execution at run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message transmission. Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speedup within 60% to 70% of handwritten code.

[1] K. Nikhil, K. Pingali, and Arvind, "Id Nouveau," Tech. Rep. CSG Memo 265, MIT Lab. Comput. Sci., 1986.
[2] A.H. Karp, "Programming for Parallelism,"Computer, Vol. 20, No. 5, May 1987, pp. 43- 57.
[3] W. Crowley, C. Hendrickson, and T. Rudy, "The SIMPLE code," Tech. Rep. UCID-17715, Lawrence Livermore Lab., 1978.
[4] F. Allen, M. Burke, P. Charles, R. Cytron, and J. Ferrante, "An overview of the PTRAN analysis system for multiprocessing,"J. Parallel Distributed Comput., vol. 5, no. 5, pp. 617-640, Oct. 1988.
[5] C. Polychronopoulos, M. Girkar, M. Haghighat. C. Lee, B. Leung. and D. Schouten, "Parafrase-2: An environment for parallelizing, partitioning, synchronizing and scheduling programs on multiprocessors,"Proc. 1989 Int. Conf. Parall. Processing, Aug. 1989.
[6] C. Koelbel, P. Mehrotra, and J. Van Rosendale, "Supporting shared data structures on distributed memory architectures," inProc. 2nd ACM SIGPLAN Symp. Principles Practice of Parallel Programming, Mar. 1990, Rep. 90-7, ICASE, Jan. 1990.
[7] J. Saltz et al., "Runtime Scheduling and Execution of Loops on Message-Passing Machines,"J. Parallel and Distributed Computing, Vol. 8, No. 4, Apr. 1990, pp. 303-312.
[8] G. Blelloch, "Prefix sums and their applications," Tech. Rep. CMU-CS- 90-190, Carnegie Mellon Univ., Nov. 1990.
[9] P. Hudak, "Parafunctional Programming,"Computer, Aug. 1986, pp. 60-71.
[10] L. Bomans and D. Roose, "Benchmarking the iPSC/2 hypercube multiprocessor,"Concurrency Practice and Experience, vol. 1, Sept. 1989.
[11] A. Rogers, "Compiling for locality of reference," Ph.D. dissertation, Cornell Univ., Aug. 1990.
[12] V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, "An interactive environment for data partitioning and distribution," inProc. 5th Distributed Memory Comput. Conf., Charleston, SC, Apr. 1990.
[13] M. Gerndt, "Updating distributed variables in local computation,"Concurrency-Practice aid Experience, vol. 2, pp. 171-193, Sept. 1990.
[14] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck, "An efficient method of computing static single assignment form," inConf. Rec. Sixteenth ACM Symp. Principles Programming Languages, Austin, TX, June 1989.
[15] K. Pingali and A. Rogers, "Compiler parallelization of SIMPLE for a distributed memory machine," Tech. Rep. 90-1084, Dept. of Comput. Sci., Cornell Univ., 1990.
[16] K. Ekanadham and Arvind, "SIMPLE Part I: An exercise in future scientific programming," Tech. Rep. RC12686, IBM, Apr. 1987.
[17] J. Lee, C. Lin, and L. Synder, "Programming SIMPLE for parallel portability,"Proc. 4th Workshop on Languages and Compilers for Parall. Computing, Aug. 1991.
[18] F. André, J.-L. Pazat, and H. Thomas "PANDORE: A system to manage data distribution," inProc. 4th Int. Conf. Supercomput., Amsterdam, The Netherlands, June, 1990, pp. 380-388.
[19] D. Callahan and K. Kennedy, "Compiling programs for distributed memory multiprocessors,"J. Supercomputing, vol. 2, Oct. 1988.
[20] M. Chen, Y. Choo, and J. Li, "Theory and pragmatics of compiling efficient parallel code," Tech. Rep. YALEU/DCS/TR-790, Yale Univ, Dec. 1989.
[21] P. Hatcher, M. Quinn, A. Lapadula, R. Anderson, and R. Jones, "Data-parallel programming on MIMD computers,"IEEE Trans. Parallel Distrib. Syst., vol. 2, pp. 377-383, July 1991.
[22] M. Quinn and P. Hatcher, "Data parallel programming on multicomputers,"IEEE Software, vol. 7, pp. 69-76, Sept. 1990.
[23] M. Rosing, R. Schnabel, and R. Weavery, "The DINO parallel programming language," Tech. Rep. CU-CS-457-90, Univ. of Colorado, Boulder, Apr. 1950.
[24] P. Tseng, "Compiling programs for a linear systolic array,"Proc. ACM Symp. Programming Language Design and Implementation, June 1990.
[25] H. Zima, H. Bast, and M. Gemdt, "SUPERB: A tool for semi-automatic MIMD/SIMD parallelization,"Parallel Computing, vol. 6, pp. 1-18, 1988.
[26] R. Ramamujam and P. Sadayappan, "Nested loop tiling for distributed memory machines,"Proc. 5th Distrib. Memory Computing Conf., Apr. 1990.
[27] R. Ramanujan, "Compile-time Techniques for Parallel Execution of Loops on Distributed Memory Multiprocessors," Ph.D. dissertation, Dept. of Comput. and Inform. Sci., Ohio State Univ., 1990.
[28] S. Hiranandani, K. Kennedy, and C. Tseng, "Computer optimization for FORTRAN D on MIMD distributed memory machines,"Proc. Supercomputing '91, 1991, pp. 86-100.
[29] M. Gerndt, "Automatic parallelization for distributed-memory multiprocessing systems," Ph.D. dissertation, Univ. of Bonn, 1989.
[30] C. Koelbel, "Compiling programs for nonshared memory machines," Ph.D. dissertation, Purdue Univ., West Lafayette, IN, Aug. 1990.
[31] C. Koelbel, P. Mehrotra, and J. van Rosendale, "Semi-automatic domain decomposition in Blaze,"Proc. Int. Conf. on Parall. Processing, 1987.
[32] P.S. Tseng, "A Parallelizing Compiler for Distributed-Memory Parallel Computers," doctoral dissertation, Carnegie Mellon Univ., Pittsburgh, May 1989.
[33] M. Lam, "A systolic array optimizing compiler," Ph.D. dissertation, Carnegie Mellon Univ., 1987.
[34] J. Li and M. Chen, "Synthesis of explicit communication from shared-memory program reference," Tech. Rep. YALEU/DCS/TR-755, Yale Univ., May 1990.
[35] N. Azari and S. Y. Lee, "Hybrid partitioning for particle-in-the-cell simulation on shared memory systems,"Proc. 1991 Int. Conf. Distrib. Syst., 1991.

Index Terms:
Index Termsprogram compilers; distributed memory systems; synchronisation; parallel programming;pipeline processing; distributed memory architectures; high-level languages; processdecomposition; synchronization; load balancing; parallelizing compiler; locality ofreference; run-time resolution; message traffic; pipelining; SIMPLE; hydrodynamicsbenchmark; Intel iPSC/2
Citation:
A. Rogers, K. Pingali, "Compiling for Distributed Memory Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 3, pp. 281-298, March 1994, doi:10.1109/71.277789
Usage of this product signifies your acceptance of the Terms of Use.