Issue No. 06 - June (1996 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.506703
<p><b>Abstract</b>—This paper considerably extends the multiprocessor scheduling techniques in [<ref rid="bibl06501" type="bib">1</ref>], [<ref rid="bibl06502" type="bib">2</ref>], and applies it to matrix arithmetic compilation. In [<ref rid="bibl06501" type="bib">1</ref>], [<ref rid="bibl06502" type="bib">2</ref>] we presented several new results in the theory of homogeneous multiprocessor scheduling. A directed acyclic graph (DAG) of tasks is to be scheduled. Tasks are assumed to be parallelizable—as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimize the finishing time.</p><p>Using optimal control theory, in the special case where the speedup function of each task is <it>p</it><super>α</super>, where <it>p</it> is the amount of processing power applied to the task, a closed form solution for task graphs formed from parallel and series connections was derived [<ref rid="bibl06501" type="bib">1</ref>], [<ref rid="bibl06502" type="bib">2</ref>]. This paper extends these results for arbitrary DAGS. The optimality conditions impose nonlinear constraints on the flow of processing power from predecessors to successors, and on the finishing times of siblings. This paper presents a fast algorithm for determining and solving these nonlinear equations. The algorithm utilizes the structure of the finishing time equations to efficiently run a conjugate gradient minimization, leading to the optimal solution.</p><p>The algorithm has been tested on a variety of DAGs commonly encountered in matrix arithmetic. The results show that if the <it>p</it><super>α</super> speedup assumption holds, the schedules produced are superior to heuristic approaches. The algorithm has been applied to compiling matrix arithmetic [<ref rid="bibl06509" type="bib">9</ref>], for the MIT Alewife machine, a distributed-shared memory multiprocessor. While matrix arithmetic tasks do not exactly satisfy the <it>p</it><super>α</super> speedup assumptions, the algorithm can be applied as a good heuristic. The results show that the schedules produced by our algorithm are faster than alternative heuristic techniques.</p>
DAG scheduling, multiprocessor scheduling, multiprocessor compilation, parallel processing, communication locality, task scheduling, distributed-memory multiprocessors.
B. R. Musicus and G. N. Prasanna, "Generalized Multiprocessor Scheduling and Applications to Matrix Computations," in IEEE Transactions on Parallel & Distributed Systems, vol. 7, no. , pp. 650-664, 1996.