This Article 
 Bibliographic References 
 Add to: 
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines
June 1997 (vol. 8 no. 6)
pp. 608-622

Abstract—Many partitioned scientific programs can be modeled as iterative executions of computational tasks and represented by iterative task graphs (ITGs). An ITG may or may not have dependence cycles. In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead and propose heuristic algorithms for scheduling both cyclic and acyclic ITGs without searching an entire iteration space. Our approach incorporates techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling, and load balancing. We analyze the asymptotic optimality of the algorithms to show that the derived schedules are competitive to optimal solutions. We also study the sensitivity of scheduling performance on inaccurate weights. Finally, we present experimental results to demonstrate the effectiveness of the optimization techniques.

[1] A. Beguelin, J.J. Dongarra, G.A. Geist, R. Manchek, and V.S. Sunderam, "Graphical Development Tools for Network-Based Concurrent Supercomputing," Proc. Supercomputing '91,Albuquerque, N.M., Nov. 1991.
[2] M. Cosnard and M. Loi, "Automatic Task Graph Generation Techniques," Parallel Processing Letters, vol. 5, no. 4, pp. 527-538, Dec. 1995.
[3] P. Chretienne, "Cyclic Scheduling with Communication Delays: A Polynomial Special Case," technical report, LITP, Dec. 1993.
[4] V. Donaldson and J. Ferrante, "Determining Asynchronous Pipeline Execution Times," technical report, Dept. of Computer Science, Univ. of Calif., San Diego, 1995. Appearing in Proc. IEEE IPPS '96 (10th Int'l Parallel Processing Symp.), 1996.
[5] T.H. Dunigan, "Performance of the INTEL iPSC/860 and nCUBE 6400 Hypercube," ORNL/TM-11790, Oak Ridge Nat'l Lab., Oak Ridge, Tenn., 1991.
[6] H. El-Rewini, T.G. Lewis, and H.H. Ali, Task Scheduling in Parallel and Distributed Systems. Prentice Hall, 1994.
[7] C. Fu and T. Yang, "Run-Time Compilation for Parallel Sparse Matrix Computations," Proc. ACM Int'l Conf. Supercomputing, pp. 237-244,Philadelphia, May 1996.
[8] H. Gabow and R. Tarjan, "Faster Scaling Algorithms for Network Problems," SIAM J. Computing, Oct. 1989.
[9] F. Gasperoni and U. Schweigelshohn, "Scheduling Loops on Parallel Processors: A Simple Algorithm with Close to Optimum Performance," Proc. CONPAR 92, pp. 613-624, 1992.
[10] A. Gerasoulis and T. Yang,"On the granularity and clustering of directed acyclic task graphs," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 6, pp. 686-701, June 1993.
[11] J.J. Hwang,Y.C. Chow,F.D. Anger, and C.Y. Lee,"Scheduling precedence graphs in systems with interprocessor communication times," SIAM J. Computing, vol. 18, no. 2, pp. 244-257, Apr. 1989.
[12] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[13] P. Newton and J.C. Browne, “The CODE 2.0 Graphical Parallel Programming Language,” Proc. ACM Int'l Conf. Supercomputing, pp. 167–177, July 1992.
[14] D. O'Hallaron, "The ASSIGN Parallel Program Generator," Technical Report CMU-CS-91-141, Computer Science Dept, Carnegie Mellon Univ., 1991.
[15] M.A. Palis, J.-C. Liou, and D.S.L. Wei, "A Greedy Task Clustering Heuristic That is Provably Good," Proc. 1994 Int'l Symp. Parallel Architectures, Algorithms, and Networks (ISPAN), pp. 398-405, IEEE, 1994.
[16] K.K. Parhi and D.G. Messerschmitt, "Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding," IEEE Trans. Computers, vol. 40, no. 2, pp. 178-195, Feb. 1991.
[17] S.S. Pande, D.P. Agrawal, and J. Mauney, "A Scalable Scheduling Method for Functional Parallelism on Distributed Memory MultiProcessors," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 4, pp. 388-399, Apr. 1995.
[18] C.D. Polychronoupolos, Parallel Programming and Compilers.Boston, Mass: Kluwer Academic Publishers, 1988.
[19] R. Reiter, "Scheduling Parallel Computations," J. ACM, vol. 15, pp. 590-599, Oct. 1968.
[20] V. Sarkar,Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors.Cambridge, Mass.: MIT Press, 1989.
[21] V. Van Dongen, G.R. Gao, and Q. Ning, "A Polynomial Time Method for Optimal Software Pipelining," Proc. Conf. Vector and Parallel Processing, CONPAR-92, Lecture Notes in Computer Science 634, pp. 613-624,Lyons, France, Springer-Verlag, Sept.1-4, 1992.
[22] K.P. Wang and J.C. Bruch Jr., "An Efficient Iterative Parallel Finite Element Computational Method," Math. Finite Elements and Applications, J.R.Whiteman, ed., 1994.
[23] P. Wang, Y.T. Yao, and M.P. Tulin, "An Efficient Numerical Tank for Nonlinear Water Waves Based on the Multi-Subdomain Approach with BEM," Int'l J. Numerical Methods in Fluids, vol. 20, pp. 383-392, 1995.
[24] R. Wolski and J. Feo, "Program Partitioning for NUMA Multiprocessor Computer Systems," technical report, Lawrence Livermore Nat'l Lab., 1992. J. Parallel and Distributed Computing, 1993.
[25] M.Y. Wu and D.D. Gajski,"Hypertool: A programming aid for message-passing systems," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 3, pp. 330-343, July 1990.
[26] T. Yang and A. Gerasoulis, “DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors,” IEEE Trans. Parallel and Distributed Systems, vol. 5, pp. 951-967, 1994.
[27] T. Yang and A. Gerasoulis, “PYRROS: Static Scheduling and Code Generation for Message Passing Multiprocessors,” Proc. Sixth ACM Int'l Conf. Supercomputing, pp. 428-437, 1992.
[28] T. Yang, C. Fu, A. Gerasoulis, and V. Sarkar, "Mapping Iterative Task Graphs on Distributed-Memory Machines," Proc. 24th Int'l Conf. Parallel Processing, vol. 2, pp. 151-158, Aug. 1995.

Index Terms:
Scheduling, communication optimization, granularity, software pipelining, iterative task graphs, directed acyclic graphs.
Tao Yang, Cong Fu, "Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 6, pp. 608-622, June 1997, doi:10.1109/71.595579
Usage of this product signifies your acceptance of the Terms of Use.