Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06)
A Locality-Driven Atomic Group Checkpoint Protocol
Taipei, Taiwan
December 04-December 07
ISBN: 0-7695-2736-1
This paper explores the use of locality of dependen-cies in large-scale distributed systems towards devel-oping efficient checkpoint strategies. Dependencies among processes evolve into message interactions, which often spread and affect recovery dependencies and logging requirements. On the other hand, message interactions are usually localized within small sub-regions formed in space and time. Aiming at both minimizing message logging and localizing recovery effect, we propose a strategy that forms group check-points around such regions and meanwhile selectively logs inter-region messages. A simple and efficient Atomic Group Checkpoint (AGC) protocol is devel-oped based on the locality information of a distributed computation, e.g., in agent communication protocol sessions in multi-agent systems. Atomicity guarantees consistency of group checkpoint and uniformity of group logging, and hence minimizes logging overhead. The correctness of the AGC protocol is analyzed and proved through a generic Checkpoint Dependency Graph (CDG) model, which captures the recovery dependency relations among checkpoints.
Citation:
Zunce Wei, Hon F. Li, Dhrubajyoti Goswami, "A Locality-Driven Atomic Group Checkpoint Protocol," pdcat, pp.558-564, Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06), 2006