loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2007 International Conference on Parallel Processing (ICPP 2007)
A Meta-Learning Failure Predictor for Blue Gene/L Systems
Xi'an, China
September 10-September 14
ISBN: 0-7695-2933-X
Prashasta Gujrati, Illinois Institute of Technology, USA
Yawei Li, Illinois Institute of Technology, USA
Zhiling Lan, Illinois Institute of Technology, USA
Rajeev Thakur, Argonne National Laboratory, USA
John White, San Diego Supercomputer Center, USA
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components used in these systems are highly reliable, the presence of large number of components inevitably increases the failure probability of such systems. Successful prediction of potential failures can greatly enhance various fault tolerance mechanisms used in large clusters, thereby mitigating the adverse impact of failures on system productivity and total cost of ownership. In this paper, we present a three-phase failure predictor to automatically process RAS events and further discover failure patterns for prediction in Blue Gene/L systems. In particular, this paper explores the use of metalearning to adaptively integrate base methods with the goal to boost prediction accuracy. Experiments with two RAS logs collected from Blue Gene/L systems at ANL and SDSC demonstrate the effectiveness of the proposed failure predictor.
Citation:
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev Thakur, John White, "A Meta-Learning Failure Predictor for Blue Gene/L Systems," icpp, pp.40, 2007 International Conference on Parallel Processing (ICPP 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.