This Article 
 Bibliographic References 
 Add to: 
Generalized Multiprocessor Scheduling and Applications to Matrix Computations
June 1996 (vol. 7 no. 6)
pp. 650-664

Abstract—This paper considerably extends the multiprocessor scheduling techniques in [1], [2], and applies it to matrix arithmetic compilation. In [1], [2] we presented several new results in the theory of homogeneous multiprocessor scheduling. A directed acyclic graph (DAG) of tasks is to be scheduled. Tasks are assumed to be parallelizable—as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimize the finishing time.

Using optimal control theory, in the special case where the speedup function of each task is pα, where p is the amount of processing power applied to the task, a closed form solution for task graphs formed from parallel and series connections was derived [1], [2]. This paper extends these results for arbitrary DAGS. The optimality conditions impose nonlinear constraints on the flow of processing power from predecessors to successors, and on the finishing times of siblings. This paper presents a fast algorithm for determining and solving these nonlinear equations. The algorithm utilizes the structure of the finishing time equations to efficiently run a conjugate gradient minimization, leading to the optimal solution.

The algorithm has been tested on a variety of DAGs commonly encountered in matrix arithmetic. The results show that if the pα speedup assumption holds, the schedules produced are superior to heuristic approaches. The algorithm has been applied to compiling matrix arithmetic [9], for the MIT Alewife machine, a distributed-shared memory multiprocessor. While matrix arithmetic tasks do not exactly satisfy the pα speedup assumptions, the algorithm can be applied as a good heuristic. The results show that the schedules produced by our algorithm are faster than alternative heuristic techniques.

[1] G.N.S. Prasanna and B.R. Musicus, "The Optimal Control Approach to Generalized Multiprocessor Scheduling," Algorithmica, 1995.
[2] G.N.S. Prasanna and B.R. Musicus, "Generalised Multiprocessor Scheduling Using Optimal Control," Third Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 216-228, July 1991.
[3] G.N.S. Prasanna, A. Agarwal, and B.R. Musicus, "Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 7, pp. 720-736, July 1994.
[4] E.F. Coffman, Jr. ed., Computer and Job Shop Scheduling Theory.New York: John Wiley and Sons, 1976.
[5] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes, the Art of Scientific Computing. Cambridge, Mass.: Cambridge Univ. Press, 1986.
[6] A. Agarwal et al., The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor, "Workshop on Scalable Shared Memory Multiprocessors." Kluwer Academic Publishers, 1991. Also MIT/LCS Memo TM-454, 1991.
[7] V. Sarkar, "Partitioning and Scheduling Programs for Multiprocessors," Technical Report CSL-TR-87-328, PhD Thesis, Computer Systems Lab., Stanford University, April 1987.
[8] T. Yang and A. Gerasoulis, "A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors," Proc. Supercomputing, pp. 633-642,Albuquerque, N.M., Nov. 1991.
[9] K.P. Belkhale and P. Banerjee, "Scheduling Algorithms for Parallelizable Tasks," Int'l Parallel Processing Symp., June 1993.
[10] S. Ramaswamy and P. Banerjee, "Processor Allocation and Scheduling of Macro Dataflow Graphs on Distributed Memory Multicomputers by The Paradigm Compiler," Int'l Conf. Parallel Processing, pp. 134-138, Aug. 1993.
[11] S. Ramaswamy, S. Saptenekar, and P. Banerjee, "A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers," Int'l Conf. Parallel Processing, Aug. 1994.
[12] J. Blazerwicz,M. Drabowski,, and j. Weglarz,“Scheduling multiprocessor tasks to minimize schedule length,” IEEE Trans. on Computers, vol. 35, no. 5, May 1986.
[13] J. Du and J. Leung,“Complexity of scheduling parallel task systems,”SIAM J. Discrete Math., vol. 2 no. 4, pp. 473–487, Nov. 1989.
[14] C.C Han and K.J. Lin, "Scheduling Parallelizable Jobs on Multiprocessors," IEEE Conf. on Real-Time Systems, pp. 59-67, 1989.
[15] C. McCreary,A.A. Khan,J.J. Thompson, and M.E. McArdle,"A comparison of heuristics for scheduling DAGs on multiprocessors," Proc. Eighth Int'l Parallel Processing Symp., pp. 446-451, 1994.
[16] J. Baxter and J.H. Patel, "The Last Algorithm: A Heuristic-Based Static Task Allocation Algorithm," Proc. Int'l Conf. Parallel Processing, vol. 2, pp. 217-222, 1989.
[17] T.C. Hu, "Parallel Sequencing and Assembly Line Problems," Operations Research, vol. 9, no. 6, pp. 841-848, 1961.
[18] D. Chaiken and A. Agarwal, "Software-Extended Coherent Shared Memory—Performance and Cost," Twenty-First Annual Int'l Symp. Computer Arch., (ISCA 21), ACM, April 1994.

Index Terms:
DAG scheduling, multiprocessor scheduling, multiprocessor compilation, parallel processing, communication locality, task scheduling, distributed-memory multiprocessors.
G. N. Srinivasa Prasanna, B. R. Musicus, "Generalized Multiprocessor Scheduling and Applications to Matrix Computations," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 6, pp. 650-664, June 1996, doi:10.1109/71.506703
Usage of this product signifies your acceptance of the Terms of Use.