This Article 
 Bibliographic References 
 Add to: 
Dynamic Task Scheduling Using Online Optimization
November 2000 (vol. 11 no. 11)
pp. 1151-1163

Abstract—Algorithms for scheduling independent tasks on to the processors of a multiprocessor system must trade-off processor load balance, memory locality, and scheduling overhead. Most existing algorithms, however, do not adequately balance these conflicting factors. This paper introduces the Self-Adjusting Dynamic Scheduling (SADS) class of algorithms that use a unified cost model to explicitly account for these factors at runtime. A dedicated processor performs scheduling in phases by maintaining a tree of partial schedules and incrementally assigning tasks to the least-cost schedule. A scheduling phase terminates whenever any processor becomes idle, at which time partial schedules are distributed to the processors. An extension of the basic SADS algorithm, called DBSADS, controls the scheduling overhead by giving higher priority to partial schedules with more task-to-processor assignments. These algorithms are compared to two distributed scheduling algorithms within a database application on an Intel Paragon distributed-memory multiprocessor system.

[1] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[2] T.L. Casavant and J.G. Kuhl,“A taxonomy of scheduling in general-purpose distributed computing systems,” IEEE Trans. on Software Engineering, vol. 14, no. 2. Feb. 1988.
[3] M.G. Norman and P. Thanisch, “Models of Machines and Computation for Mapping in Multicomputers,” ACM Computing Surveys, vol. 25, no. 3, pp. 263-302, 1993.
[4] Butterfly Parallel Processor Overview. Cambridge, Mass.: BBN, June 1985.
[5] Inside the Butterfly GP1000. Cambridge, Mass.: BBN, Oct. 1988.
[6] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 63-79, Mar. 1992.
[7] G. J. Lipovski and M. Malek,Parallel Computing: Theory and Comparisons. New York: Wiley, 1987.
[8] Z.G. Vranesic, M. Stumm, D.M. Lewis, and R. White, “Hector: A Hierarchically Structured Shared-Memory Multiprocessor,” Computer, vol. 24, no. 1, pp. 72-79, Jan. 1991.
[9] D. J. Lilja,“Cache coherence in large-scale shared memory multiprocessors: Issues and comparisons,”ACM Comput. Surv., vol. 25, no. 3, pp. 303–338, Sept. 1993.
[10] N.G. Shivaratri, P. Krueger, and M. Singhal, “Load Distributing for Locally Distributed Systems,” Computer, vol. 25, no. 12, pp. 33-44, Dec. 1992.
[11] M. Willebeck-LeMair and A. Reeves, “Strategies for Dynamic Load Balancing on Highly Parallel Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 9, pp. 979-993, Sept. 1993.
[12] D.L. Eager, E.D. Lazowska, and J. Zahorjan, "Adaptive Load Sharing in Homogeneous Distributed Systems," IEEE Trans. Software Eng., vol. 12, no. 5, pp. 662-675, May 1986.
[13] S. Zhou, "A Trace-Driven Simulation Study of Dynamic Load Balancing," IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1,327-1,341, Sept. 1988.
[14] R.M. Bryant and R.A. Finkel, “A Stable Distributed Scheduling Algorithm,” Proc. Second Int'l Conf. Distributed Computing Systems, pp. 314–323, 1981.
[15] K. Ramamritham and J.A. Stankovic, “Dynamic Task Scheduling in Distributed Hard RealTime Systems,” Software, vol. 1, no. 3, pp. 65–75, July 1984.
[16] H.G. Rotithor and S.S. Pyo, “Decentralized Decision Making in Adaptive Task Sharing,” Proc. Second IEEE Symp. Parallel and Distributed Processing, pp. 34–41, 1990.
[17] D.L. Eager, E.D. Lazowska, and J. Zahorjan, "A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing," Performance Evaluation, Vol. 6, Mar. 1986, pp. 53-68.
[18] M. Livny and M. Melman, "Load Balancing in Homogeneous Broadcast Distributed Systems," Proc. ACM Computer Network Performance Symp., pp. 47-55, 1982.
[19] H.C. Lin and C.S. Raghavendra, “A Dynamic Load-Balancing Policy with a Central Job Dispatcher (LBC),” IEEE Trans. Software Eng., vol. 18, no. 2, pp. 148-158, Feb. 1992.
[20] E.P. Markatos and T.J. LeBlanc, “Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 4, pp. 379-400, Apr. 1994.
[21] P. Krueger and M. Livny, “The Diverse Objectives of Distributed Scheduling Policies,” Proc. Seventh Int'l Conf. Distributed Computing Systems, pp. 242–249, 1987.
[22] P. Krueger, “Distributed Scheduling for a Changing Environment,” PhD thesis (Technical Report 780), Computer Science Dept., Univ. of Wisconsin at Madison, June 1988.
[23] P. Krueger and M. Livny, “A Comparison of Preemptive and Non-Preemptive Load Distributing,” Proc. IEEE Int'l Conf. Distributed Computing Systems, pp. 123-130, 1988.
[24] H. Berryman,J. Saltz,, and J. Scroggs,“Execution time support for adaptive scientific algorithms on distributed memory architectures,” Concurrency: Practice and Experience, vol. 3, pp. 159–178, 1991.
[25] G. Cybenko, "Dynamic Load Balancing for Distributed Memory Multiprocessors," J. Parallel and Distributed Computing, vol. 7, pp. 279-301, 1989.
[26] K.M. Dragon and J.L. Gustafson, “A Low-Cost Hypercube Load Balance Algorithm,” Proc. Fourth Conf. Hypercube Concurrent Computers and Applications, pp. 583–590, 1989.
[27] J.K. Salmon, “Parallel Hierarchical N-Body Methods,” Technical Report CRPC9014, Center for Research in Parallel Computing, Caltech, 1990.
[28] M.S. Squillante and E.D. Lazowska, "Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling," IEEE Trans. Parallel and Distributed Systems, Vol. 4, No. 2, Feb. 1993, pp. 131-143.
[29] D.G. Feitelson and L. Rudolph, “Coscheduling Based on Runtime Identification of Activity Working Sets,” Int'l J. Parallel Programming, vol. 23, no. 2, pp. 135–160, Apr. 1995.
[30] H. Li, S. Tandri, M. Stumm, and K.C. Sevcik, “Locality and Loop Scheduling on NUMA Multiprocessors,” Proc. Int'l Conf. Parallel Processing, pp. II-140–II-147, 1993.
[31] P. Tang, P.C. Yew, and C. Zhu, “Impact of Self-Scheduling on Performance of Multiprocessor Systems,” Proc. ACM Int'l Conf. Supercomputing, pp. 593–603, July 1988.
[32] S.F. Hummel, E. Schonberg, and L.E. Flynn, "Factoring: A Practical and Robust Method for Scheduling Parallel Loops," Proc. Supercomputing Conf., pp. 610-619, Nov. 1991.
[33] S.F. Hummel, E. Schonberg, and L.E. Flynn, “Factoring: A Method for Scheduling Parallel Loops,” Comm. ACM, vol. 35, no. 8, pp. 90-101, Aug. 1992.
[34] T.H. Tzen and L.M. Ni, “Dynamic Loop Scheduling for Shared Memory Multiprocessors,” Int'l Conf. Parallel Processing, vol. 2, pp. 247–250, 1991.
[35] T.H. Tzen and L.M. Ni, "Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers," IEEE Trans. Parallel and Distributed Systems, vol. 4, pp. 87-98, Jan. 1993.
[36] M. Devarakonda and A. Mukherjee, “Issues in Implementation of Cache-Affinity Scheduling,” Proc. Winter USENIX Conf., pp. 345–357, 1992.
[37] J. Torrellas, A. Tucker, and A. Gupta, “Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors,” J. Parallel and Distributed Computing, vol. 24, no. 2, pp. 139-151, Feb. 1995.
[38] J. Weissman, “Scheduling Parallel Computations in a Heterogeneous Environment,” PhD thesis, Dept. Computer Science, Univ. of Virginia, Aug. 1995.
[39] I. Foster, “Automatic Generation of Self-Scheduling Programs,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 68–78, 1991.
[40] M. AlMouhamed, “Analysis of Macro-Dataflow Dynamic Scheduling on Nonuniform Memory Access Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 8, pp. 875–887, Aug. 1993.
[41] B.J. Smith, “Architecture and Applications of the HEP Multiprocessor Computer System,” Real-Time Signal Processing IV, Soc. Photo-Optical Instrumentation Eng., vol. 298, pp. 241–248, Aug. 1981.
[42] P. Krueger and M. Livny, “Load Balancing, Load Sharing and Performance in Distributed Systems,” Technical Report TR 700, Computer Science Dept., Univ. of Wisconsin at Madison, 1987.
[43] Y.T. Wang and R.J.T. Morris, “Load Sharing in Distributed Systems,” IEEE Trans. Computers, vol. 34, pp. 204–217, 1985.
[44] F. Douglis and J. Ousterhout, "Transparent Process Migration: Design Alternatives and the Sprite Implementation," Software Practice&Experience, Vol. 21, Aug. 1991, pp. 757-785.
[45] M. Litzkow, M. Livny, and M.W. Mutka, “Condor—A Hunter of Idle Workstations,” Proc. Eighth Int'l Conf. Distributed Computing Systems, Jun. 1988.
[46] J. Liu and V.A. Saletore, “Self-Scheduling on Distributed-Memory Machines,” ACM Int'l Conf. Supercomputing, pp. 814–823, 1993.
[47] V.A. Saletore, J. Liu, and B.Y. Lam, “Scheduling Nonuniform Parallel Loops on Distributed Memory Machines,” Proc. Hawaii Int'l Conf. System Sciences, vol. 2, pp. 516–525, 1993.
[48] D.G. Feitelson and L. Rudolph, "Distributed Hierarchical Control for Parallel Processing," Computer, vol. 23, no. 5, pp. 65-77, May 1990.
[49] S. Dandamudi and P. Cheng, "A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 1, pp. 1-16, Jan. 1995.
[50] S. Subramaniam and D.L. Eager, "Affinity Scheduling of Unbalanced Workloads," Proc. Supercomputing '94, pp. 214-226, 1994.
[51] H. Li and K.C. Sevcik, “Exploiting Cache Affinity in Software Cache Coherence,” Proc. ACM Int'l Conf. Supercomputing, pp. 264–273, 1994.
[52] S. Lucco, "A Dynamic Scheduling Method for Irregular Parallel Programs," Proc. ACM SIGPLAN '92 Conf. Programming Language Design and Implementation, pp. 200-211, 1992.
[53] O. Plata and F.F. Rivera, “Combining Static and Dynamic Scheduling on Distributed-Memory Multiprocessors,” Proc. ACM Int'l Conf. Supercomputing, pp. 186–195,
[54] J.E. Bahr, S.B. Levenstein, L.A. McMahon, T.J. Mullins, and A.H. Wottreng, “Architecture, Design, and Performance of Application System/400 (AS/400) Multiprocessors,” IBM J. Research and Development, vol. 36, no. 6, pp. 1,001–1,014, Nov. 1992.
[55] R.M. Karp and J. Pearl, “Searching for an Optimal Path in a Tree with Random Costs,” Artificial Intelligence, vol. 21, pp. 99–116, 1983.
[56] B. Hamidzadeh and D.J. Lilja, “Self-Adjusting Scheduling: An OnLine Optimization Technique for Locality Management and Load Balancing,” Proc. Int'l Conf. Parallel Processing, 1994.
[57] B. Hamidzedeh and D.J. Lilja, “Dynamic Scheduling Strategies for Shared-Memory Multiprocessors,” Proc. 16th Int'l Conf. Distributed Computing Systems, May 1996.
[58] N.J. Nilsson, Principles of Artificial Intelligence. Morgan Kaufmann, 1980.
[59] C.C. Shen and W.H. Tsai, “A Graph Matching Approach to Optimal Task Assignment in Distributed Computing Systems Using a Minimax Criterion,” IEEE Trans. Computers, Mar. 1985.
[60] IEEE Trans. Knowledge and Data Eng., H. Garcia-Molina and K.Salem, guest eds., special issue on Main-Memory Database Management Systems, vol. 4, no. 6, Dec. 1992.
[61] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.

Index Terms:
Dynamic scheduling, scheduling costs, load balancing, locality management.
Babak Hamidzadeh, Lau Ying Kit, David J. Lilja, "Dynamic Task Scheduling Using Online Optimization," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 11, pp. 1151-1163, Nov. 2000, doi:10.1109/71.888636
Usage of this product signifies your acceptance of the Terms of Use.