loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 Seventh IEEE International Symposium on Network Computing and Applications
Identifying Failures in Grids through Monitoring and Ranking
July 10-July 12
ISBN: 978-0-7695-3192-2
In this paper we present FailRank, a novel framework for integrating and ranking information sources that characterize failures in a grid system. After the failing sites have been ranked, these can be eliminated from the job scheduling resource pool yielding in that way a more predictable, dependable and adaptive infrastructure. We also present the tools we developed towards evaluating the FailRank framework. In particular, we present the FailBase Repository which is a 38GB corpus of state information that characterizes the EGEE Grid for one month in 2007. Such a corpus paves the way for the community to systematically uncover new, previously unknown patterns and rules between the multitudes of parameters that can contribute to failures in a Grid environment. Additionally, we present an experimental evaluation study of the FailRank system over 30 days which shows that our framework identifies failures in 93% of the cases. We believe that our work constitutes another important step towards realizing adaptive Grid computing systems.
Index Terms:
Grid Computing, Dependability, Top-k Ranking
Citation:
Demetrios Zeinalipour-Yazti, Kyriacos Neocleous, Chryssis Georgiou, Marios D. Dikaiakos, "Identifying Failures in Grids through Monitoring and Ranking," nca, pp.291-298, 2008 Seventh IEEE International Symposium on Network Computing and Applications, 2008
Usage of this product signifies your acceptance of the Terms of Use.