loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2004 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN'04)
Checkpointing in Hybrid Distributed Systems
Hong Kong, SAR, China
May 10-May 12
ISBN: 0-7695-2135-5
Jiannong Cao, Hong Kong Polytechnic University
Yifeng Chen, Hong Kong Polytechnic University; Wuhan University, China
Kang Zhang, University of Texas at Dallas
Yanxiang He, Wuhan University, China
To provide fault tolerance to computer systems suffering from transient faults, checkpointing and rollback recovery is one of the widely-used techniques. Among others, two primary checkpointing schemes have been proposed: independent and coordinated schemes. However, most existing works address only the need of employing a single checkpointing and rollback recovery scheme to a target system. In this paper, issues are discussed and a new algorithm is developed to address the need of integrating independent and coordinated checkpointing schemes for applications running in a hybrid distributed environment containing multiple heterogeneous subsystems. The required changes to the original checkpointing schemes for each subsystem and the overall prevented unnecessary rollbacks for the integrated system are presented. Also described is an algorithm for collecting garbage checkpoints in the combined hybrid system.
Citation:
Jiannong Cao, Yifeng Chen, Kang Zhang, Yanxiang He, "Checkpointing in Hybrid Distributed Systems," ispan, pp.136, 2004 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.