loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 11
A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
Christine Morin, IRISA/INRIA
Ramamurthy Badrinath, Hewlett-Packard ISO
Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high probability of a node failure. We propose a hierarchical checkpointing protocol that combines a synchronized checkpointing technique inside clusters and a communication-induced technique between clusters. This protocol fits to the characteristics of a cluster federation (large number of nodes, high latency and low bandwidth networking technologies between clusters). A preliminary performance evaluation performed using a discrete event simulator shows that the protocol is suitable for code coupling applications.
Index Terms:
Cluster Federation, Checkpointing and Recovery, Fault-tolerance, Parallel Application, Code Coupling
Citation:
Sébastien Monnet, Christine Morin, Ramamurthy Badrinath, "A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations," ipdps, vol. 12, pp.211a, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 11, 2004
Usage of this product signifies your acceptance of the Terms of Use.