This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing?
July 2009 (vol. 20 no. 7)
pp. 983-996
Edi Shmueli, Haifa University Campus, Haifa, and The Hebrew University of Jerusalem, Jerusalem
Dror G. Feitelson, The Hebrew University of Jerusalem, Jerusalem
It is customary to use open-system trace-driven simulations to evaluate the performance of parallel-system schedulers. As a consequence, all schedulers have evolved to optimize the packing of jobs in the schedule, as a means to improve a number of performance metrics that are conjectured to be correlated with user satisfaction, with the premise that this will result in a higher productivity in reality. We argue that these simulations suffer from severe limitations that lead to suboptimal scheduler designs and to even dismissing potentially good design alternatives. We propose an alternative simulation methodology called site-level simulation, in which the workload for the evaluation is generated dynamically by user models that interact with the system. We present a novel scheduler called CREASY that exploits knowledge on user behavior to directly improve user satisfaction and compare its performance to the original packing-based EASY scheduler. We show that user productivity improves by up to 50 percent under the user-aware design, while according to the conventional metrics, performance may actually degrade.

[1] D. Lifka, “The ANL/IBM SP Scheduling System,” Proc. IPPS Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '95), D.G. Feitelson and L. Rudolph, eds., pp. 295-303, 1995.
[2] E. Shmueli and D.G. Feitelson, “Backfilling with Lookahead to Optimize the Packing of Parallel Jobs,” J. Parallel and Distributed Computing, vol. 65, no. 9, pp. 1090-1107, 2005.
[3] B.G. Lawson and E. Smirni, “Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel Systems,” ACM SIGMETRICS Performance Evaluation Rev., vol. 29, no. 4, pp. 40-47, 2002.
[4] A.W. Mu'alem and D.G. Feitelson, “Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 529-543, June 2001.
[5] D. Talby and D.G. Feitelson, “Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling,” Proc. 13th Int'l Symp. Parallel Processing and the 10th Symp. Parallel and Distributed Processing (IPPS/SPDP '99), p. 513, 1999.
[6] E. Shmueli and D.G. Feitelson, “Uncovering the Effect of System Performance on User Behavior from Traces of Parallel Systems,” Proc. 15th IEEE Int'l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS '07), pp.274-280, Oct. 2007.
[7] J. Zilber, O. Amit, and D. Talby, “What Is Worth Learning from Parallel Workloads? A User and Session Based Analysis,” Proc. 19th ACM Int'l Conf. Supercomputing (ICS '05), pp. 377-386, June 2005.
[8] C.B. Lee and A. Snavely, “On the User-Scheduler Dialogue: Studies of User-Provided Runtime Estimates and Utility Functions,” Int'l J. High Performance Computing Applications, vol. 20, no. 4, pp. 495-506, 2006.
[9] D.G. Feitelson, “Packing Schemes for Gang Scheduling,” Proc. IPPS Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '96), D.G. Feitelson and L. Rudolph, eds., pp. 89-110, 1996.
[10] D.G. Feitelson, “Locality of Sampling and Diversity in Parallel System Workloads,” Proc. 21st ACM Int'l Conf. Supercomputing (ICS '07), pp. 53-63, June 2007.
[11] V. Lo and J. Mache, “Job Scheduling for Prime Time versus Non-Prime Time,” Proc. IEEE Int'l Conf. Cluster Computing (CLUSTER '02), no. 4, pp. 488-493, Sept. 2002.
[12] G. Sabin and P. Sadayappan, “Unfairness Metrics for Space-Sharing Parallel Job Schedulers,” Proc. 11th Int'l Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '05), D.G. Feitelson, E.Frachtenberg, L. Rudolph, and U. Schwiegelshohn, eds., pp.238-256, 2005.
[13] S. Srinivasan, R. Kettimuthu, V. Subramani, and P. Sadayappan, “Selective Reservation Strategies for Backfill Job Scheduling,” Revised Papers from the Eighth Int'l Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '02), pp. 55-71, 2002.
[14] J. William, A. Ward, C.L. Mahood, and J.E. West, “Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy,” Revised Papers from the Eighth Int'l Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '02), pp. 88-102, 2002.
[15] A.B. Downey, “A Parallel Workload Model and Its Implications for Processor Allocation,” Cluster Computing, vol. 1, no. 1, pp.133-145, 1998.
[16] J. Jann, P. Pattnaik, H. Franke, F. Wang, J. Skovira, and J. Riodan, “Modeling of Workload in MPPs,” Proc. IPPS Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '97), D.G. Feitelson and L. Rudolph, eds., pp. 95-116, 1997.
[17] U. Lublin and D.G. Feitelson, “The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs,” J.Parallel and Distributed Computing, vol. 63, no. 11, pp.1105-1122, Nov. 2003.
[18] W. Cirne and F. Berman, “A Comprehensive Model of the Supercomputer Workload,” Proc. Fourth IEEE Int'l Workshop Workload Characterization (WWC '01), Dec. 2001.
[19] M.F. Arlitt, “Characterizing Web User Sessions,” ACM SIGMETRICS Performance Evaluation Rev., vol. 28, no. 2, pp.50-63, 2000.
[20] H. Haugerud and S. Straumsnes, “Simulation of User-Driven Computer Behaviour,” Proc. 15th Usenix Conf. System Administration (LISA '01), pp. 101-108, 2001.
[21] H. Hlavacs and G. Kotsis, “Modeling User Behavior: A Layered Approach,” Proc. Seventh IEEE Int'l Symp. Modeling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS'99), p. 218, 1999.
[22] E. Shmueli and D.G. Feitelson, “Using Site-Level Modeling to Evaluate the Performance of Parallel System Schedulers,” Proc. 14th IEEE Int'l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS '06), pp. 167-178, 2006.
[23] A. Bouch, A. Kuchinsky, and N. Bhatti, “Quality Is in the Eye of the Beholder: Meeting Users' Requirements for Internet Quality of Service,” Proc. ACM SIGCHI Conf. Human Factors in Computing Systems (CHI '00), pp. 297-304, 2000.
[24] D.N. Tran, W.T. Ooi, and Y.C. Tay, “Sax: A Tool for Studying Congestion-Induced Surfer Behavior,” Proc. Passive and Active Measurement Conf. (PAM '06), Mar. 2006.
[25] K.-T. Chen, C.-Y. Huang, P. Huang, and C.-L. Lei, “Quantifying Skype User Satisfaction,” Proc. ACM SIGCOMM '06, pp. 399-410, 2006.
[26] D. Tsafrir, Y. Etsion, and D.G. Feitelson, “Backfilling Using System-Generated Predictions Rather Than User Runtime Estimates,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 6, pp. 789-803, June 2007.
[27] S.-H. Chiang, A. Arpaci-Dusseau, and M.K. Vernon, “The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance,” Proc. Eighth Int'l Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '02), pp.103-127, July 2002.

Index Terms:
Parallel job scheduling, trace-driven simulations, open-system model, user behavior, feedback.
Citation:
Edi Shmueli, Dror G. Feitelson, "On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing?," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 7, pp. 983-996, July 2009, doi:10.1109/TPDS.2008.152
Usage of this product signifies your acceptance of the Terms of Use.