Design and Potential Performance of Goal-Oriented Job Scheduling Policies for Parallel Computer Workloads
Issue No.12 - December (2008 vol.19)
Su-Hui Chiang , Portland State Univ, Portland
Sangsuree Vasupongayya , Portland State University, Portland
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.48
To balance multiple scheduling performance requirements on parallel computer systems, traditional job schedulers use many parameters that can be configured to define job or queue priorities. Offering many parameters seems flexible, but in reality tuning the values for the parameters is highly challenging. To simplify the task of resource management, we propose goal-oriented policies, which allow system administrators to specify high-level performance objectives, rather than tuning low-level scheduling parameters. We study the design of goal-oriented policies, including (1) appropriate multi-objective models for specifying trade-offs between objectives, (2) efficient search algorithms for searching the best schedule at each scheduling decision point, and (3) appropriate performance measures to be optimized in the objectives with respect to two common performance requirements: preventing starvation and favoring shorter jobs. We compare goal-oriented policies with widely used backfill policies. Policies are evaluated by simulation using ten monthly workloads that ran on a Linux cluster (IA-64) from NCSA. Our results show that by automatically optimizing performance according to the given objectives through search, goal-oriented policies can simultaneously outperform FCFS-backfill and LXF-backfill, which are designed in favor of the maximum wait and average slowdown, respectively.
Scheduling, Parallel systems, Batch processing systems, Goal-oriented policies, Multi-objective models, Backfill scheduling policies, Search algorithms
Su-Hui Chiang, Sangsuree Vasupongayya, "Design and Potential Performance of Goal-Oriented Job Scheduling Policies for Parallel Computer Workloads", IEEE Transactions on Parallel & Distributed Systems, vol.19, no. 12, pp. 1642-1656, December 2008, doi:10.1109/TPDS.2008.48