High-Performance Distributed Computing, International Symposium on (2006)
June 19, 2006 to June 23, 2006
H. Casanova , Dept. of Inf.&Comput. Sci., Hawaii Univ., Manoa, HI
Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes are granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled resources the one that produce the shortest turnaround time. As a result, an increasing number of users resort to "redundant requests": several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we investigate whether redundant requests are harmful in terms of (i) schedule performance and fairness, (ii) system load, and (iii) system predictability. We find that two main issues with redundant requests are load on the middleware and unfairness towards users who do not use redundant requests, which both depend on the number of users who use redundant requests and on the amount of request redundancy these users employ
middleware, redundant batch request, parallel computing, queue waiting time, batch-scheduled resource, shortest turnaround time, schedule performance, system load, system predictability
H. Casanova, "On the Harmfulness of Redundant Batch Requests," High-Performance Distributed Computing, International Symposium on(HPDC), Paris, 2006, pp. 255-266.