This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling
June 2001 (vol. 12 no. 6)
pp. 529-543

Abstract—Scheduling jobs on the IBM SP2 system and many other distributed-memory MPPs is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order in which the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. This situation led to the development of the EASY scheduler which uses aggressive backfilling: Small jobs are moved ahead to fill in holes in the schedule, provided they do not delay the first job in the queue. We compare this approach with a more conservative approach in which small jobs move ahead only if they do not delay any job in the queue and show that the relative performance of the two schemes depends on the workload: For workloads typical on SP2 systems, the aggressive approach is indeed better, but, for other workloads, both algorithms are similar. In addition, we study the sensitivity of backfilling to the accuracy of the runtime estimates provided by the users and find a very surprising result: Backfilling actually works better when users overestimate the runtime by a substantial factor.

[1] D. Das Sharma and D.K. Pradhan, “Job Scheduling in Mesh Multicomputers,” Proc. Int'l Conf. Parallel Processing, vol. II, pp. 251-258, Aug. 1994.
[2] A.B. Downey, “Using Queue Time Predictions for Processor Allocation,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing, pp. 35-57, Apr. 1997.
[3] A.B. Downey and D.G. Feitelson, “The Elusive Goal of Workload Characterization,” Performance Evaluation Review, vol. 26, no. 4, pp. 14-29, Mar. 1999.
[4] D.G. Feitelson, Packing Schemes for Gang Scheduling Job Scheduling Strategies for Parallel Processing, pp. 89-110, 1996.
[5] D.G. Feitelson, “A Survey of Scheduling in Multiprogrammed Parallel Systems,” Research Report RC 19790 (87657), IBM T.J. Watson Research Center, Oct. 1994.
[6] D.G. Feitelson and M.A. Jette, Improved Utilization and Responsiveness with Gang Scheduling Job Scheduling Strategies for Parallel Processing, pp. 238-261, 1997.
[7] D.G. Feitelson and A.M. Weil, “Utilization and Predictability in Scheduling the IBM SP2 with Backfilling,” Proc. 12th Int'l Parallel Processing Symp., pp. 542-546, Apr. 1998.
[8] D.G. Feitelson and B. Nitzberg, “Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 337-360, Springer-Verlag, 1995.
[9] D.G. Feitelson, L. Rudolph, U. Schwiegelshohn, K.C. Sevcik, and P. Wong, “Theory and Practice in Parallel Job Scheduling,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing, pp. 1-34, Apr. 1997.
[10] R. Gibbons, “A Historical Application Profiler for Use by Parallel Schedulers,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing, pp. 58-77, Apr. 1997.
[11] S. Hotovy, “Workload Evolution on the Cornell Theory Center IBM SP2,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 27-40, Springer-Verlag, 1996.
[12] Intel Corp., iPSC/860Multi-User Accounting, Control, and Scheduling Utilities Manual, order no. 312261-002, May 1992.
[13] R. Jain, The Art of Computer Systems Performance Analysis. John Wiley&Sons, 1991.
[14] J. Jann, P. Pattnaik, H. Franke, F. Wang, J. Skovira, and J. Riordan, “Modeling of Workload in MPPs,” Proc. Third Ann. Workshop Job Scheduling Strategies for Parallel Processing, pp. 95-116, Apr. 1997.
[15] J.P. Jones and B. Nitzberg, “Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 1-16, Springer-Verlag, 1999.
[16] P. Krueger, T. H. Lai, and V. A. Radiya,“Job scheduling is more important than processor allocation for hypercube computers,”IEEE Trans. Parallel and Distrib. Syst., vol. 5, pp. 488–497, May 1994.
[17] D. Lifka, “The ANL/IBM SP Scheduling System,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing vol. 949, pp. 295-303, Apr. 1995.
[18] V. Lo, J. Mache, and K. Windisch, “A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 25-46, Springer-Verlag, 1998.
[19] L. Malinowsky and P. Öster, “Scheduling of a Parallel Workload: Implementation and Use of the Argonne EASY Scheduler at PDC,” Applied Parallel Computing, B. Kågström, J. Dongarra, E. Elmroth, and J. Wasniewski, eds., pp. 309-314, Springer-Verlag, 1998.
[20] C. McCann,R. Baswami,, and J. Zahoran,“A dynamic processor allocation policy for multiprogrammed shared-memorymultiprocessors,” ACM Trans. Computer Systems, vol. 11, no. 2, pp. 146-176, 1993.
[21] P. Messina, “The Concurrent Supercomputing Consortium: Year 1,” IEEE Parallel&Distributed Technology, Vol. 1 No. 1 Feb. 1993, pp. 9–16.
[22] Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallelworkload /, 2001
[23] J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY-LoadLeveler API Project,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 41–47, 1996.
[24] W. Smith, I. Foster, and V. Taylor, “Predicting Application Run Times Using Historical Information,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 122-142, Springer-Verlag, 1998.
[25] D. Talby and D.G. Feitelson, “Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling,” Proc. 13th Int'l Parallel Processing Symp., pp. 513-517, Apr. 1999.
[26] D. Talby, D.G. Feitelson, and A. Raveh, “Comparing Logs and Models of Parallel Workloads Using the Co-Plot Method,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 43-66, Springer-Verlag, 1999.
[27] D. Zotkin and P.J. Keleher, “Job-Length Estimation and Performance in Backfilling Schedulers,” Proc. Eighth High Performance Distributed Computing Conf., 1999.

Index Terms:
Parallel job scheduling, backfilling, runtime estimates, workload modeling, performance metrics.
Citation:
Ahuva W. Mu'alem, Dror G. Feitelson, "Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 6, pp. 529-543, June 2001, doi:10.1109/71.932708
Usage of this product signifies your acceptance of the Terms of Use.