loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Papers
Fault-Aware Job Scheduling for BlueGene/L Systems
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
A. J. Oliner, Massachusetts Institute of Technology
R. K. Sahoo, IBM T.J. Watson Research Center
J. E. Moreira, IBM T.J. Watson Research Center
M. Gupta, IBM T.J. Watson Research Center
A. Sivasubramaniam, Pennsylvania State University
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. In this paper evaluate the effectiveness of a previously developed job scheduling algorithm for BlueGene/L in the presence of faults. We have developed two new job-scheduling algorithms considering failures while scheduling the jobs. We have also evaluated the impact of these algorithms on average bounded slowdown, average response time and system utilization, considering different levels of proactive failure prediction and prevention techniques reported in the literature. Our simulation studies show that the use of these new algorithms with even trivial fault prediction confidence or accuracy levels (as low as 10%) can significantly improve the performance of the BlueGene/L system.
Citation:
A. J. Oliner, R. K. Sahoo, J. E. Moreira, M. Gupta, A. Sivasubramaniam, "Fault-Aware Job Scheduling for BlueGene/L Systems," ipdps, vol. 1, pp.64a, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Papers, 2004
Usage of this product signifies your acceptance of the Terms of Use.