This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
November 1997 (vol. 8 no. 11)
pp. 1098-1116

Abstract—Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.

[1] High Performance Fortran Forum, "High Performance Fortran Language Specification, version 1.1," technical report, Center for Research on Parallel Computation, Rice Univ., Houston, Texas, Nov. 1994.
[2] C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[3] P. Banerjee et al., "The Paradigm Compiler for Distributed-Memory Multicomputers," Computer, Vol. 28, No. 10, Oct. 1995, pp. 37-47.
[4] S. Ramaswamy and P. Banerjee, "Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers," Proc. Frontiers '95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp. 342-349,McLean, Va., Feb. 1995.
[5] S. Ramaswamy, "Simultaneous Exploitation of Task and Data Parallelism in Regular Scientific Applications," PhD thesis CRHC-96-03/UILU-ENG-96-2203, Dept. of Electrical and Computer Eng., Univ. of Illi nois, Urbana, Jan. 1996.
[6] B. Avalani, I. Foster, M. Xu, and A. Choudhary, "A Compilation System That Integrates HPF and Fortran M," Proc. Scalable High Performance Computing Conf. (SHPCC-94), pp. 293-300, May 1994.
[7] T. Gross, D. O'Halloran, and J. Subhlok, "Task Parallelism in a High Performance Fortran Framework," IEEE Parallel and Distributed Technology, vol. 2, no. 3, pp. 16-26, Fall 1994.
[8] S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiling Fortran D for MIMD Distributed-Memory Machines," Comm. ACM, vol. 35, no. 8, pp. 66-80, Aug. 1992.
[9] S.P. Amarasinghe, J.M. Anderson, M.S. Lam, and A.W. Lim, "An Overview of a Compiler for Scalable Parallel Machines," Proc. Sixth Workshop Languages and Compilers for Parallel Computing, pp. 253-272,Portland, Ore., Aug. 1993.
[10] M. Gupta et al., "An HPF Compiler for the IBM SP2," Proc. Supercomputing '95 (CD-ROM), IEEE Computer Society Press, Los Alamitos, Calif., 1995.
[11] Digital High Performance Fortran 90 HPF and PSE Manual.Maynard, Mass.: Digital Equipment Corp., 1995.
[12] PGHPF User's Guide.Wilsonville, Ore.: Portland Group Inc., 1995.
[13] XHPF User's Guide, Version 2.0.Placerville, Calif.: Applied Parallel Research, 1995.
[14] B. Chapman, P. Mehrotra, and H. Zima, "Programming in Vienna Fortran," Scientific Programming, vol. 1, no. 1, pp. 31-50, Aug. 1992.
[15] Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, and M.-Y. Wu, "Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers," J. Parallel and Distributed Computing, vol. 21, no. 1, pp. 15-26, Apr. 1994.
[16] J. Subhlok, J.M. Stichnoth, D.R. O'Hallaron, and T. Gross, "Exploiting Task and Data Parallelism on a Multicomputer," Proc. Fourth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 13-22, May 1993.
[17] P. Dinda, T. Gross, D. O'Halloran, E. Segall, J. Stichnoth, J. Subhlok, J. Webb, and B. Yang, "The CMU Task Parallel Program Suite," Technical Report CMU-CS-94-131, School of Computer Science, Carnegie Mellon Univ., Pittsburgh, Pa., Mar. 1994.
[18] I. Foster and K.M. Chandy, "Fortran M: A Language for Modular Parallel Programming," J. Parallel and Distributed Computing, vol. 26, no. 1, pp. 24-35, Apr. 1995.
[19] C.D. Polychronopoulos, M. Girkar, M.R. Haghighat, C.L. Lee, B. Leung, and D. Schouten, "Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing and Scheduling Programs on Multiprocessors," Proc. 18th Int'l Conf. Parallel Processing, pp. 39-48,St. Charles, Ill., Aug. 1989.
[20] M. Girkar and C.D. Polychronopoulos, "Automatic Extraction of Functional Parallelism from Ordinary Programs," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 166-178, Mar. 1992.
[21] M. Girkar, "Functional Parallelism: Theoretical Foundations and Implementations," PhD thesis CSRD-1182, Center for Supercomputing Research and Development, Univ. of Illi nois, Urbana, Dec. 1991.
[22] J.E. Moreira, "On the Implementation and Effectiveness of Autoscheduling for Shared-Memory Multiprocessors," PhD thesis, Center for Supercomputing Research and Development, Univ. of Illi nois, Urbana, Jan. 1995.
[23] M. Dhagat, R. Bagrodia, and M. Chandy, "Integrating Task and Data Parallelism in UC," Proc. Int'l Conf. Parallel Processing, pp. 29-36,Oconomowoc, Wis., Aug. 1995.
[24] J.K. Lenstra and A.H.G.R. Kan, "Complexity of Scheduling under Precedence Constraints," Operations Research, vol. 26, no. 1, pp. 22-35, Jan. 1978.
[25] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[26] V. Sarkar,Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors.Cambridge, Mass.: MIT Press, 1989.
[27] T. Yang and A. Gerasoulis, "A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors," Proc. Supercomputing, pp. 633-642,Albuquerque, N.M., Nov. 1991.
[28] T. Yang and A. Gerasoulis, "A Parallel Programming Tool for Scheduling on Distributed Memory Multiprocessors," Proc. Scalable High Performance Computing Conference, pp. 350-357,Williamsburg, Va., Apr. 1992.
[29] G.N.S. Prasanna and A. Agarwal, "Compile-Time Techniques for Processor Allocation in Macro Dataflow Graphs for Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 279-283,St. Charles, Ill., Aug. 1992.
[30] G.N.S. Prasanna, A. Agarwal, and B.R. Musicus, "Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 7, pp. 720-736, July 1994.
[31] K.P.M Belkhale and P. Banerjee, "Approximate Algorithms for the Partitionable Independent Task Scheduling Problem," Proc. 19th Int'l Conf. Parallel Processing, pp. 72-75,St. Charles, Ill., Aug. 1990.
[32] K.P. Belkhale and P. Banerjee, "A Scheduling Algorithm for Parallelizable Dependent Tasks," Proc. Int'l Parallel Processing Symp., pp. 500-506,Anaheim, Calif., Apr. 1991.
[33] J. Subhlok, D. O'Halloran, T. Gross, P. Dinda, and J. Webb, "Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs," Proc. Supercomputing '94, pp. 330-339,Washington D.C., Nov. 1994.
[34] J. Subhlok and G. Vondran, "Optimal Mapping of Sequences of Data Parallel Tasks," Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 134-143,Santa Barbara, Calif., July 1995.
[35] C.D. Polychronopoulos, M. Girkar, M.R. Haghighat, C.L. Lee, B. Leung, and D. Schouten, "Parafrase-2 Manual," technical report, Center for Supercomputing Research and Development, Univ. of Illi nois, Urbana, Aug. 1990.
[36] E.W. Hodges IV, "High Performance Fortran Support for the PARADIGM Compiler," MS thesis CRHC-95-23/UILU-ENG-95-2237, Dept. of Electrical and Computer Eng., Univ. of Illi nois, Urbana, Oct. 1995.
[37] J.G. Ecker, "Geometric Programming: Methods, Computations and Applications," SIAM Rev., vol. 22, no. 3, pp. 338-362, July 1980.
[38] P.M. Vaidya, "A New Algorithm for Minimizing Convex Functions Over Convex Sets," Proc. Symp. Foundations of Computer Science, pp. 332-337,Research Triangle Park, N.C., Oct. 1989.
[39] M. Gupta, "Automatic Data Partitioning on Distributed Memory Multicomputers," PhD thesis CRHC-92-19/UILU-ENG-92-2237, Dept. of Computer Science, Univ. of Illi nois, Urbana, Sept. 1992.
[40] M. Gupta and P. Banerjee, "Compile-Time Estimation of Communication Costs on Multicomputers," Proc. Sixth Int'l Parallel Processing Symp., pp. 470-475,Beverly Hills, Calif., Mar. 1992.
[41] C.L. Liu, Elements of Discrete Mathematics.New York: McGraw-Hill, 1985.
[42] M.R. Garey, R.L. Graham, and D.S. Johnson, "Performance Guarantees for Scheduling Algorithms," Operations Research, vol. 26, no. 1, pp. 3-21, Jan. 1978.
[43] Q. Wang and H. Cheng,“A heuristic of scheduling parallel tasks and its analysis,”SIAM J. Comput., vol. 21, no. 2, pp. 281–294, Apr. 1992.
[44] S. Ramaswamy, S. Sapatnekar, and P. Banerjee, "A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers," Proc. 23rd Int'l Conf. Parallel Processing, vol. II, pp. 116-125,St. Charles, Ill., Aug. 1994.
[45] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes in C.Cambridge, England: Cambridge Univ. Press, 1988.
[46] S.L. Lyons, T.J. Hanratty, and J.B. McLaughlin, "Large-Scale Computer Simulation of Fully Developed Channel Flow with Heat Transfer," Int'l J. Numerical Methods for Fluids, vol. 13, no. 8, pp. 999-1,028, Nov. 1991.
[47] R. Barett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, 1994.
[48] A. Lain, "Compiler and Run-Time Support for Irregular Computations," PhD thesis CRHC-92-22, Dept. of Computer Science, Univ. of Illi nois, Urbana, Oct. 1995.

Index Terms:
Task parallel, data parallel, allocation, scheduling, HPF, distributed memory, convex programming.
Citation:
Shankar Ramaswamy, Sachin Sapatnekar, Prithviraj Banerjee, "A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 11, pp. 1098-1116, Nov. 1997, doi:10.1109/71.642945
Usage of this product signifies your acceptance of the Terms of Use.