8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05)
Cloning-Based Checkpoint for Localized Recovery
Las Vegas, Nevada, USA
December 07-December 09
ISBN: 0-7695-2509-1
DOI Bookmark:
http://doi.ieeecomputersociety.org/10.1109/ISPAN.2005.26
This paper studies the use of process clones towards localizing recovery in large-scale distributed systems. A clone is a virtual recovery process with a limited life, and is useful for decoupling recovery dependencies among checkpoints. A generic Checkpoint Dependency Graph (CDG) model is used to capture the dependency relations among checkpoints. A Non-atomic Group Checkpoint (NGC) protocol is presented. It is proved that the protocol can result in localized recovery involving a single group when clones are employed. To limit recovery spread, the size of a group should be limited. This paper presents a few interesting results in this aspect: (i) there is no embedded protocol for atomic group formation with a bounded group-size (k-bounded protocol); (ii) a k-bounded atomic group checkpoint protocol requires at least m-1 explicit messages for checkpoint synchronization in a system consisting of m processes. Lastly, a simple k-bounded atomic group checkpoint protocol is presented and proved.
Citation:
Zunce Wei, Hon F. Li, Dhrubajyoti Goswami, "Cloning-Based Checkpoint for Localized Recovery," ispan, pp.174-181, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05), 2005
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||