8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05) Cloning-Based Checkpoint for Localized Recovery Las Vegas, Nevada, USA December 07-December 09 ISBN: 0-7695-2509-1
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ISPAN.2005.26
This paper studies the use of process clones towards localizing recovery in large-scale distributed systems. A clone is a virtual recovery process with a limited life, and is useful for decoupling recovery dependencies among checkpoints. A generic Checkpoint Dependency Graph (CDG) model is used to capture the dependency relations among checkpoints. A Non-atomic Group Checkpoint (NGC) protocol is presented. It is proved that the protocol can result in localized recovery involving a single group when clones are employed. To limit recovery spread, the size of a group should be limited. This paper presents a few interesting results in this aspect: (i) there is no embedded protocol for atomic group formation with a bounded group-size (k-bounded protocol); (ii) a k-bounded atomic group checkpoint protocol requires at least m-1 explicit messages for checkpoint synchronization in a system consisting of m processes. Lastly, a simple k-bounded atomic group checkpoint protocol is presented and proved.
Citation:
Zunce Wei, Hon F. Li, Dhrubajyoti Goswami, "Cloning-Based Checkpoint for Localized Recovery," ispan, pp.174-181, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05), 2005 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||