This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Task Clustering and Scheduling for Distributed Memory Parallel Architectures
January 1996 (vol. 7 no. 1)
pp. 46-55

Abstract—This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in $O(n(n\ l{\sl g}\ n + e))$ time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication.

[1] A.V. Aho,J.E. Hopcroft, and J.D. Ullman,The Design and Analysis of Computer Algorithms.Reading, Mass.: Addison-Wesley, 1974.
[2] J. Baxter and J.H. Patel,"The last algorithm: A heuristic-based static allocation algorithm," Proc. 1989 Int'l Conf. on Parallel Processing, vol. 2, pp. 217-222, 1989.
[3] P. Chretienne,"Complexity of tree scheduling with interprocessor communication delays," Tech. Report M.A.S.I. 90.5, UniversitéPierre et Marie Curie, 1990.
[4] Y.-C. Chung and S. Ranka,"Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors," Proc. Supercomputing '92, pp. 512-521, 1992.
[5] T.H. Cormen,C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms.Cambridge, Mass.: MIT Press/McGraw-Hill, 1990.
[6] W. Dally,"Network and processor architecture for message-driven computer," R. Suaya and G. Birtwistle, eds., VLSI and Parallel Computation.San Mateo, Calif.: Morgan Kaufmann, pp. 140-218, 1990.
[7] A. Gerasoulis and T. Yang,"On the granularity and clustering of directed acyclic task graphs," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 6, pp. 686-701, June 1993.
[8] J.J. Hwang,Y.C. Chow,F.D. Anger, and C.Y. Lee,"Scheduling precedence graphs in systems with interprocessor communication times," SIAM J. Computing, vol. 18, no. 2, pp. 244-257, Apr. 1989.
[9] S. Kim and J.C. Browne,"A general approach to mapping of parallel computation upon multiprocessor architectures," Proc. Int'l Conf. Parallel Processing, vol. 3, pp. 1-8, 1988.
[10] B. Kruatrachue and T. Lewis,"Grain size determination for parallel processing," IEEE Software, pp. 23-32, Jan. 1988.
[11] C.Y. Lee,J.J. Hwang,Y.C. Chow, and F.D. Anger,"Multiprocessor scheduling with interprocessor communication delays," Oper. Res. Lett., vol. 7, no. 3, pp. 141-147, 1988.
[12] T. Leighton,M. Newman,A.G. Ranada, and E. Schwabe,"Dynamic tree embeddings in butterflies and hypercubes," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 224-234, 1989.
[13] C. McCreary and H. Gill,"Automatic determination of grain size for efficient parallel processing," Comm. ACM, pp. 1,073-1,078, Sept. 1989.
[14] C. McCreary,A.A. Khan,J.J. Thompson, and M.E. McArdle,"A comparison of heuristics for scheduling DAGs on multiprocessors," Proc. Eighth Int'l Parallel Processing Symp., pp. 446-451, 1994.
[15] C.H. Papadimitriou and M. Yannakakis,"Towards an architecture-independent analysis of parallel algorithms," SIAM J. Computing, vol. 19, no. 2, pp. 322-328, Apr. 1990.
[16] H.E. Rewini and T.G. Lewis,"Scheduling parallel program tasks onto arbitrary target machines," J. Parallel and Distributed Computing, vol. 9, pp. 138-153, 1990.
[17] V. Sarkar,Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors.Cambridge, Mass.: MIT Press, 1989.
[18] M.Y. Wu and D.D. Gajski,"Hypertool: A programming aid for message-passing systems," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 3, pp. 330-343, July 1990.
[19] T. Yang,"Scheduling and code generation for parallel architectures," PhD thesis, Rutgers Univ., May 1993. Tech. Report DCS-TR-299.

Index Terms:
Program task graph, task granularity, task scheduling, distributed memory architectures, approximation algorithms.
Citation:
Michael A. Palis, Jing-Chiou Liou, David S.L. Wei, "Task Clustering and Scheduling for Distributed Memory Parallel Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 1, pp. 46-55, Jan. 1996, doi:10.1109/71.481597
Usage of this product signifies your acceptance of the Terms of Use.