2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) (2016)
June 27, 2016 to June 30, 2016
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDCS.2016.95
We address the problem of scheduling jobs with utilities that depend solely upon their completion-times in a shared cloud that imposes considerable uncertainty on the jobs' runtime. However, it is very hard to estimate the jobs' runtime in a shared cloud where jobs are often delayed due to reasons such as slow I/O performance and variations in memory availability. Unlike prior works, we acknowledge that runtime estimates are often erroneous and instead shift the burden of robustness to the job scheduler. Specifically, we present a scheduling problem that jointly accounts for: (i) job utilities specified as functions of their completion-time, and (ii) uncertainty in the jobs' runtime. Our proposed solution to this problem achieves lexicographic max-min fairness among the job utilities. We implement this as a robust scheduler, named RUSH, for YARN in Hadoop. Our experiments, using real-world data sets, illustrate RUSH's efficacy when compared with other commonly used schedulers.
Runtime, Robustness, Uncertainty, Containers, Yarn, Optimization, Estimation
Z. Huang, B. Balasubramanian, M. Wang, T. Lan, M. Chiang and D. H. Tsang, "RUSH: A RobUst ScHeduler to Manage Uncertain Completion-Times in Shared Clouds," 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Nara, Japan, 2016, pp. 242-251.