This Article 
 Bibliographic References 
 Add to: 
Minimizing Data and Synchronization Costs in One-Way Communication
December 2000 (vol. 11 no. 12)
pp. 1232-1251

Abstract—Minimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers. This paper presents a general technique which uses a global data-flow framework to optimize communication and synchronization in the context of the one-way communication model. In contrast to the conventional send/receive message-passing communication model, one-way communication is a new paradigm that decouples message transmission and synchronization. In parallel machines with appropriate low-level support, this may open up new opportunities not only to further optimize communication, but also to reduce the synchronization overhead. We present optimization techniques using our framework for eliminating redundant data communication and synchronization operations. Our approach works with the most general data alignments and distributions in languages like High Performance Fortran (HPF) and uses a combination of the traditional data-flow analysis and polyhedral algebra. Empirical results for several scientific benchmarks on a Cray T3E multiprocessor machine demonstrate that our approach is successful in reducing the number of data (communication) and synchronization messages, thereby reducing the overall execution times.

[1] V. Adve, J. Mellor-Crummey, and A. Sethi, “An Integer Set Framework for HPF Analysis and Code Generation,” Technical Report TR97-275, Computer Science Dept., Rice Univ., 1997.
[2] V. Adve and J. Mellor-Crummey, “Advanced Code Generation for High Performance Fortran,” Languages, Compilation Techniques, and Run-Time Systems for Scalable Parallel Systems, chapter 18, Springer-Verlag (to appear), 2001.
[3] G. Agrawal and J. Saltz, “Interprocedural Data Flow Based Optimizations for Distributed Memory Compilation,” Software Practice and Experience, vol. 27, no. 5, pp. 519-545, 1997.
[4] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[5] S.P. Amarasinghe and M.S. Lam, “Communication Optimization and Code Generation for Distributed Memory Machines,” Proc. ACM SIGPLAN Programming Language Design and Implementation, pp. 126-138, June 1993.
[6] C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell, "A Linear Algebra Framework for Static HPF Code Distribution," Scientific Programming, to appear. Available as CRI-Ecole des Mines Technical Report A-278-CRI, .
[7] P. Banerjee et al., "The Paradigm Compiler for Distributed-Memory Multicomputers," Computer, Vol. 28, No. 10, Oct. 1995, pp. 37-47.
[8] E. Barton, J. Cownie, and M. McLaren, “Message Passing on the Meiko CS-2,” Parallel Computing, vol. 20, no. 4, Apr. 1994.
[9] D. Callahan and K. Kennedy, “Analysis of Interprocedural Side Effects in a Parallel Programming Environment,” J. Parallel and Distributed Computing, vol. 5, pp. 517–550, 1988.
[10] S. Chakrabarti, M. Gupta, and J. Choi, “Global Communication Analysis and Optimization,” Programming Language Design and Implementation, pp. 68-78, 1996.
[11] B. Chapman, P. Mehrotra, and H. Zima, “Programming in Vienna Fortran,” Scientific Programming, vol. 1, no. 1, pp. 31-50, Fall 1992.
[12] Cray Research Inc., Cray T3D System Architecture Overview, 1993.
[13] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick, "Parallel Programming in Split-C," Supercomputing, 1993.
[14] E. Dusterwald, R. Gupta, and M. Soffa, “A Practical Data-Flow Framework for Array Reference Analysis and Its Application in Optimization,” Proc. ACM Programming Language Design and Implementation (PLDI), pp. 68-77, 1993.
[15] A. Geist, PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. Scientific and Eng. Computation Series, 1994.
[16] C. Gong, R. Gupta, and R. Melhem, “Compilation Techniques for Optimizing Communication on Distributed-Memory Systems,” Proc. Int'l Conf. Parallel Processing, vol. II, p. 39-46, Aug. 1993.
[17] E. Granston and A. Veidenbaum, "Detecting Redundant Acccesses To Array Data," Proc. Supercomputing '91, pp. 854-965, 1991.
[18] M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 179-193, Mar. 1992.
[19] M. Gupta and E. Schonberg, “Static Analysis to Reduce Synchronization Costs in Data-Parallel Programs,” Proc. Principles of Programming Languages, Jan. 1996.
[20] M. Gupta, E. Schonberg, and H. Srinivasan, “A Unified Data-Flow Framework for Optimizing Communication,” Proc. Seventh Workshop Languages and Compilers for Parallel Computing, Aug. 1994.
[21] M. Gupta et al., "An HPF Compiler for the IBM SP2," Proc. Supercomputing '95 (CD-ROM), IEEE Computer Society Press, Los Alamitos, Calif., 1995.
[22] M.W. Hall, S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines," Proc. Supercomputing '92, pp. 522-534.Los Alamitos, Calif.: IEEE CS Press, Nov. 1992.
[23] M.W. Hall, B.R. Murphy, S.P. Amarasinghe, S.-W. Liao, and M.S. Lam, “Interprocedural Analysis for Parallelization,” Proc. Eighth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 61–80, Aug. 1995.
[24] R. v. Hanxleden and K. Kennedy, “A Code Placement Framework and Its Application to Communication Generation,” Technical Report CRPC-TR93337-S, Center for Research on Parallel Computation, Rice Univ., Oct. 1993.
[25] R. von Hanxleden and K. Kennedy, "Give-N-Take—A Balanced Code Placement Framework," Proc. SIGPLAN '94 Conf. Programming Language Design and Implementation, pp. 107-120. ACM Press, June 1994.
[26] K. Hayashi, T. Doi, T. Horie, Y. Koyanagi, O. Shiraki, N. Imamura, T. Shimizu, H. Ishihata, and T. Shindo, “AP1000+: Architectural Support of Put/Get Interface for Parallelizing Compiler,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 196-207, Oct. 1994.
[27] J. Heinlein, K. Gharachorloo, S.A. Dresser, and A. Gupta, "Integration of Message Passing and Shared Memory in the Stanford FLASH Multiprocessor," Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pp. 38-50, Oct. 1994.
[28] High Performance Fortran Forum, “High Performance Fortran Language Specification,” Scientific Programming, vol. 2, nos. 1-2, pp. 1-170, 1993.
[29] S. Hinrichs, “Compiler Directed Architecture-Dependent Communication Optimization,” PhD dissertation, School of Computer Science, Carnegie Mellon Univ., 1995.
[30] S. Hinrichs, “Synchronization Elimination in the Deposit Model,” Proc. 1996 Int'l Conf. Parallel Processing, pp. 87-94, 1996.
[31] S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiling Fortran D for MIMD Distributed-Memory Machines," Comm. ACM, vol. 35, no. 8, pp. 66-80, Aug. 1992.
[32] M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy, “A Generalized Framework for Global Communication Optimization,” Proc. Int'l Parallel Processing Symp. (IPPS '98), pp. 69-73, Mar. 1998.
[33] M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy, “A Global Communication Optimization Technique Based on Data Flow Analysis and Linear Algebra,” ACM Trans. Programming Languages and Systems (TOPLAS), vol. 21, no. 6, pp. 1,251-1,297, Nov. 1999.
[34] W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott, “The Omega Library Interface Guide,” Technical Report CS-TR-3445, Computer Science Dept., Univ. of Maryland, College Park, Mar. 1995.
[35] K. Kennedy and N. Nedeljkovic, “Combining Dependence and Data-Flow Analyses to Optimize Communication,” Proc. Ninth Int'l Parallel Processing Symp., pp. 340-346, Apr. 1995.
[36] K. Kennedy and A. Sethi, “Resource-Based Communication Placement Analysis,” Languages and Compilers for Parallel Computing, D. Sehr et al., eds., Lecture Notes in Computer Science, vol. 1,239, pp. 369-388, Springer-Verlag, 1997.
[37] J. Knoop, O. Rüthing, and B. Steffen, “Optimal Code Motion: Theory and Practice,” ACM Trans. Programming Languages and Systems, vol. 16, no. 4, pp. 1117-1155, 1994.
[38] C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[39] D. Kranz et al., “Integrating Message Passing and Shared Memory: Early Experience,” Proc. Fourth ACM SIGPlan Symp. Principles and Practice of Parallel Programming, ACM Press, New York, 1963, pp. 54-63.
[40] U. Kremer, “Automatic Data Layout for Distributed Memory Machines,” PhD thesis, Rice Univ., Houston, TX, Oct. 1995.
[41] Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, July 1997.
[42] M. O'Boyle and F. Bodin, "Compiler Reduction of Synchronization in Shared Memory Virtual Memory Systems," Proc. 1995 ACM Int'l Conf. Supercomputing, pp. 318-327,Barcelona, Spain, July 1995.
[43] C. Polychronopoulos, M.B. Girkar, M.R. Haghighat, C.L. Lee, B.P. Leung, and D.A. Schouten, “Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors,” Proc. Int'l Conf. Parallel Processing, pp. II 39-48, Aug. 1989.
[44] W. Pugh, “The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis,” Comm. ACM, vol. 8, pp. 102–114, Aug. 1992.
[45] S.K. Reinhardt, J.R. Larus, and D.A. Wood, “Tempest and Typhoon: User-Level Shared Memory,” Proc. 21st Int'l Symp. Computer Architecture, pp. 325-337, Apr. 1994.
[46] A. Rogers and K. Pingali,“Process decomposition through locality of reference,” Proc. SIGPLAN’89 Conf. Program Language Design and Implementation,Portland, Ore., June 1989.
[47] S.L. Scott, "Synchronization and Communication in the T3E Multiprocess," Proc. ASPLOS-VII, Oct. 1996.
[48] T. Stricker, J. Stichnoth, D. O'Hallaron, S. Hinrichs, and T. Gross, "Decoupling Synchronization and Data Transfer in Message Passing Systems of Parallel Computers," Proc. Int'l Conf. Supercomputing, pp. 1-10,Barcelona, July 1995.
[49] E. Su, “Compiler Framework for Distributed Memory Multicomputers,” PhD thesis, Univ. of Illinois at Urbana-Champaign, Mar. 1997.
[50] J. Subhlok, "Analysis of Synchronization in a Parallel Programming Environment," PhD thesis, Dept. of Computer Science, Rice Univ., Aug. 1990.
[51] C.A. Thekkath, H.M. Levy, and E.D. Lazowska, "Separating Data and Control Transfer in Distributed Operating Systems," Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1994, pp. 2-11.
[52] C. Tseng, "Compiler Optimizations for Eliminating Barrier Synchronization," Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 144-155,Santa Barbara, Calif., July 1995.
[53] T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Int’l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., May 1992, pp. 256-266.
[54] M. Wolfe, High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
[55] X. Yuan, R. Gupta, and R. Melhem, “An Array Data Flow Analysis Based Communication Optimizer,” 10th Ann. Workshop Languages and Compilers for Parallel Computing, (LCPC '97), Aug. 1997.
[56] X. Yuan, R. Gupta, and R. Melhem, “Demand-Driven Data Flow Analysis for Communication Optimization,” Parallel Processing Letters, vol. 7, no. 4, pp. 359-370, Dec. 1997.
[57] H. Zima, H. Bast, and M. Gerndt, “SUPERB: A Tool for Semi-Automatic MIMD/SIMD Parallelization,” Parallel Computing, vol. 6, pp. 1-18, 1988.

Index Terms:
One-way communication, message-passing, redundant synchronization, compiler optimizations, data-flow analysis, linear algebra techniques, data-parallel languages.
Mahmut Kandemir, Alok Choudhary, Prithviraj Banerjee, J. Ramanujam, Nagaraj Shenoy, "Minimizing Data and Synchronization Costs in One-Way Communication," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 12, pp. 1232-1251, Dec. 2000, doi:10.1109/71.895791
Usage of this product signifies your acceptance of the Terms of Use.