The Community for Technology Leaders
2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) (2016)
Nara, Japan
June 27, 2016 to June 30, 2016
ISSN: 1063-6927
ISBN: 978-1-5090-1484-2
pp: 242-251
ABSTRACT
We address the problem of scheduling jobs with utilities that depend solely upon their completion-times in a shared cloud that imposes considerable uncertainty on the jobs' runtime. However, it is very hard to estimate the jobs' runtime in a shared cloud where jobs are often delayed due to reasons such as slow I/O performance and variations in memory availability. Unlike prior works, we acknowledge that runtime estimates are often erroneous and instead shift the burden of robustness to the job scheduler. Specifically, we present a scheduling problem that jointly accounts for: (i) job utilities specified as functions of their completion-time, and (ii) uncertainty in the jobs' runtime. Our proposed solution to this problem achieves lexicographic max-min fairness among the job utilities. We implement this as a robust scheduler, named RUSH, for YARN in Hadoop. Our experiments, using real-world data sets, illustrate RUSH's efficacy when compared with other commonly used schedulers.
INDEX TERMS
Runtime, Robustness, Uncertainty, Containers, Yarn, Optimization, Estimation
CITATION

Z. Huang, B. Balasubramanian, M. Wang, T. Lan, M. Chiang and D. H. Tsang, "RUSH: A RobUst ScHeduler to Manage Uncertain Completion-Times in Shared Clouds," 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Nara, Japan, 2016, pp. 242-251.
doi:10.1109/ICDCS.2016.95
83 ms
(Ver 3.3 (11022016))