Subscribe

Issue No.08 - August (2009 vol.20)

pp: 1158-1172

Naga Vydyanathan , Siemens Corporate Technology, India, Bangalore

Sriram Krishnamoorthy , Pacific Northwest Laboratory, Richland

Gerald M. Sabin , RNET Technologies, Inc., Dayton

Umit V. Catalyurek , Ohio State University, Columbus

Tahsin Kurc , Ohio State University, Columbus

Ponnuswamy Sadayappan , Ohio State University, Columbus

Joel H. Saltz , Ohio State University, Columbus

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.219

ABSTRACT

Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two (also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

INDEX TERMS

Processor allocation, scheduling, mixed parallelism, data-flow graphs, locality-conscious scheduling.

CITATION

Naga Vydyanathan, Sriram Krishnamoorthy, Gerald M. Sabin, Umit V. Catalyurek, Tahsin Kurc, Ponnuswamy Sadayappan, Joel H. Saltz, "An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications",

*IEEE Transactions on Parallel & Distributed Systems*, vol.20, no. 8, pp. 1158-1172, August 2009, doi:10.1109/TPDS.2008.219REFERENCES

- [1] V. Kumar, A. Grama, A. Gupta, and G. Karypis,
Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994.- [2] M.J. Quinn,
Parallel Computing: Theory and Practice, second ed. McGraw-Hill, 1994.- [9] A. Radulescu, C. Nicolescu, A.J.C. van Gemund, and P. Jonker, “CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems,”
Proc. 15th Int'l Parallel and Distributed Processing Symp. (IPDPS '01), p. 39, 2001.- [12]
Standard Task Graph Set, Kasahara Laboratory, Waseda Univ., http://www.kasahara.elec.waseda.ac.jpschedule , 2008.- [13] G. Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison, S. Hirata, C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, and P. Sadayappan, “A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry,”
Proc. ACM/IEEE Supercomputing Conf. (SC '02), pp. 1-10, 2002.- [14] G.H. Golub and C.F.V. Loan,
Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.- [15] C.H. Papadimitriou and M. Yannakakis, “Towards an Architecture-Independent Analysis of Parallel Algorithms,”
SIAM J.Computing, vol. 19, no. 2, pp. 322-328, 1990.- [18] K. Jansen and L. Porkolab, “Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks,”
Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '99), pp.490-498, 1999.- [20] R. Lepere, D. Trystram, and G.J. Woeginger, “Approximation Algorithms for Scheduling Malleable Tasks under Precedence Constraints,”
Int'l J. Foundations of Computer Science, vol. 13, no. 4, pp. 613-627, 2002.- [22] V. Boudet, F. Desprez, and F. Suter, “One-Step Algorithm for Mixed Data and Task Parallel Scheduling without Data Replication,”
Proc. 17th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2003.- [23] K. Li, “Scheduling Precedence Constrained Parallel Tasks on Multiprocessors Using the Harmonic System Partitioning Scheme,”
J. Information Sciences and Eng., vol. 21, no. 2, pp. 309-326, 2005.- [29] M. Cosnard and M. Loi, “Automatic Task Graph Generation Techniques,”
Parallel Processing Letters, vol. 5, no. 4, pp. 527-538, 1995.- [33] N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz, “An Integrated Approach to Locality Conscious Processor Allocation and Scheduling of Mixed Parallel Applications,” Technical Report OSU-CISRC-2/08-TR04, Ohio State Univ., ftp://ftp.cse.ohio-state.edu/pub/tech-report/ 2008TR04.pdf, 2008.
- [34] H. Kasahara and S. Narita, “Parallel Processing of Robot-Arm Control Computation on a Multiprocessor System,”
IEEE J.Robotics and Automation, vol. A-1, no. 2, pp. 104-113, 1985.- [35] A.B. Downey, “A Model for Speedup of Parallel Programs,” Technical Report CSD-97-933, http://allendowney.com/researchmodel/, 1997.
- [37]
Task Graphs for Free, http://ziyang.ece.northwestern.edu/tgffindex.html , 2008. |