The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (2009 vol.20)
pp: 1158-1172
Naga Vydyanathan , Siemens Corporate Technology, India, Bangalore
Sriram Krishnamoorthy , Pacific Northwest Laboratory, Richland
Gerald M. Sabin , RNET Technologies, Inc., Dayton
Umit V. Catalyurek , Ohio State University, Columbus
Tahsin Kurc , Ohio State University, Columbus
Ponnuswamy Sadayappan , Ohio State University, Columbus
Joel H. Saltz , Ohio State University, Columbus
ABSTRACT
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two (also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.
INDEX TERMS
Processor allocation, scheduling, mixed parallelism, data-flow graphs, locality-conscious scheduling.
CITATION
Naga Vydyanathan, Sriram Krishnamoorthy, Gerald M. Sabin, Umit V. Catalyurek, Tahsin Kurc, Ponnuswamy Sadayappan, Joel H. Saltz, "An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 8, pp. 1158-1172, August 2009, doi:10.1109/TPDS.2008.219
REFERENCES
[1] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994.
[2] M.J. Quinn, Parallel Computing: Theory and Practice, second ed. McGraw-Hill, 1994.
[3] Y.-K. Kwok and I. Ahmad, “Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors,” ACM Computing Survey, vol. 31, no. 4, pp. 406-471, 1999.
[4] S. Ramaswamy, S. Sapatnekar, and P. Banerjee, “A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 11, pp. 1098-1116, Nov. 1997.
[5] S. Chakrabarti, J. Demmel, and K. Yelick, “Modeling the Benefits of Mixed Data and Task Parallelism,” Proc. Seventh Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA '95), pp. 74-83, 1995.
[6] S.B. Hassen, H.E. Bal, and C.J.H. Jacobs, “A Task and Data-Parallel Programming Language Based on Shared Objects,” ACM Trans. Programming Languages and Systems, vol. 20, no. 6, pp. 1131-1170, 1998.
[7] N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz, “An Integrated Approach for Processor Allocation and Scheduling of Mixed-Parallel Applications,” Proc. Int'l Conf. Parallel Processing (ICPP '06), pp. 443-450, 2006.
[8] N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz, “Locality Conscious Processor Allocation and Scheduling for Mixed-Parallel Applications,” Proc. IEEE Int'l Conf. Cluster Computing (Cluster '06), pp. 1-10, 2006.
[9] A. Radulescu, C. Nicolescu, A.J.C. van Gemund, and P. Jonker, “CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems,” Proc. 15th Int'l Parallel and Distributed Processing Symp. (IPDPS '01), p. 39, 2001.
[10] A. Radulescu and A. van Gemund, “A Low-Cost Approach towards Mixed Task and Data Parallel Scheduling,” Proc. Int'l Conf. Parallel Processing (ICPP '01), pp. 69-76, Sept. 2001.
[11] T. Rauber and G. Rünger, “Compiler Support for Task Scheduling in Hierarchical Execution Models,” J. System Architecture, vol. 45, nos. 6-7, pp. 483-503, 1999.
[12] Standard Task Graph Set, Kasahara Laboratory, Waseda Univ., http://www.kasahara.elec.waseda.ac.jpschedule , 2008.
[13] G. Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison, S. Hirata, C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, and P. Sadayappan, “A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry,” Proc. ACM/IEEE Supercomputing Conf. (SC '02), pp. 1-10, 2002.
[14] G.H. Golub and C.F.V. Loan, Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.
[15] C.H. Papadimitriou and M. Yannakakis, “Towards an Architecture-Independent Analysis of Parallel Algorithms,” SIAM J.Computing, vol. 19, no. 2, pp. 322-328, 1990.
[16] J. Du and J.Y.-T. Leung, “Complexity of Scheduling Parallel Task Systems,” SIAM J. Discrete Math., vol. 2, no. 4, pp. 473-487, 1989.
[17] J. Turek, J.L. Wolf, and P.S. Yu, “Approximate Algorithms Scheduling Parallelizable Tasks,” Proc. Fourth Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA '92), pp. 323-332, 1992.
[18] K. Jansen and L. Porkolab, “Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks,” Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '99), pp.490-498, 1999.
[19] J. Blazewicz, M. Machowiak, J. Weglarz, M. Kovalyov, and D. Trystram, “Scheduling Malleable Tasks on Parallel Processors to Minimize the Makespan,” Annals of Operations Research, vol. 129, nos. 1-4, pp. 65-80, 2004.
[20] R. Lepere, D. Trystram, and G.J. Woeginger, “Approximation Algorithms for Scheduling Malleable Tasks under Precedence Constraints,” Int'l J. Foundations of Computer Science, vol. 13, no. 4, pp. 613-627, 2002.
[21] K. Jansen and H. Zhang, “An Approximation Algorithm for Scheduling Malleable Tasks under General Precedence Constraints,” ACM Trans. Algorithms, vol. 2, no. 3, pp. 416-434, 2006.
[22] V. Boudet, F. Desprez, and F. Suter, “One-Step Algorithm for Mixed Data and Task Parallel Scheduling without Data Replication,” Proc. 17th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2003.
[23] K. Li, “Scheduling Precedence Constrained Parallel Tasks on Multiprocessors Using the Harmonic System Partitioning Scheme,” J. Information Sciences and Eng., vol. 21, no. 2, pp. 309-326, 2005.
[24] J. Barbosa, C. Morais, R. Nobrega, and A. Monteiro, “Static Scheduling of Dependent Parallel Tasks on Heterogeneous Clusters,” Proc. Fourth Int'l Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, pp. 1-8, 2005.
[25] J. Subhlok and G. Vondran, “Optimal Latency-Throughput Tradeoffs for Data Parallel Pipelines,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA '96), pp. 62-71, 1996.
[26] A.N. Choudhary, B. Narahari, D.M. Nicol, and R. Simha, “Optimal Processor Assignment for a Class of Pipelined Computations,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 4, pp. 439-445, Apr. 1994.
[27] G.N.S. Prasanna and B.R. Musicus, “Generalised Multiprocessor Scheduling Using Optimal Control,” Proc. Third Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA '91), pp. 216-228, 1991.
[28] M. Iverson, F. Özgüner, and L. Potter, “Statistical Prediction of Task Execution Times through Analytical Benchmarking for Scheduling in a Heterogeneous Environment,” IEEE Trans. Computers, vol. 48, no. 12, pp. 1374-1379, Dec. 1999.
[29] M. Cosnard and M. Loi, “Automatic Task Graph Generation Techniques,” Parallel Processing Letters, vol. 5, no. 4, pp. 527-538, 1995.
[30] P.B. Bhat, C.S. Raghavendra, and V.K. Prasanna, “Efficient Collective Communication in Distributed Heterogeneous Systems,” Proc. 19th Int'l Conf. Distributed Computing Systems (ICDCS'99), pp. 15-24, 1999.
[31] L. Prylli and B. Tourancheau, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, no. 1, pp. 63-72, 1997.
[32] S. Srinivasan, R. Kettimuthu, V. Subramani, and P. Sadayappan, “Characterization of Backfilling Strategies for Parallel Job Scheduling,” Proc. Int'l Conf. Parallel Processing Workshops, pp.514-519, 2002.
[33] N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz, “An Integrated Approach to Locality Conscious Processor Allocation and Scheduling of Mixed Parallel Applications,” Technical Report OSU-CISRC-2/08-TR04, Ohio State Univ., ftp://ftp.cse.ohio-state.edu/pub/tech-report/ 2008TR04.pdf, 2008.
[34] H. Kasahara and S. Narita, “Parallel Processing of Robot-Arm Control Computation on a Multiprocessor System,” IEEE J.Robotics and Automation, vol. A-1, no. 2, pp. 104-113, 1985.
[35] A.B. Downey, “A Model for Speedup of Parallel Programs,” Technical Report CSD-97-933, http://allendowney.com/researchmodel/, 1997.
[36] A.B. Downey, “A Parallel Workload Model and Its Implications for Processor Allocation,” Proc. Sixth Int'l Symp. High Performance Distributed Computing (HPDC '97), p. 112, 1997.
[37] Task Graphs for Free, http://ziyang.ece.northwestern.edu/tgffindex.html , 2008.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool