Issue No. 09 - September (1999 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.798312
<p><b>Abstract</b>—A classical way to determine consistent snapshots consists in using Chandy-Lamport's algorithm. This algorithm relies on specific control messages that allow processes to synchronize local checkpoint determination and message recording in order for the resulting snapshot to be consistent. This paper investigates a communication-induced approach to determine consistent snapshots. In such an approach, control information is carried out by application messages. Two abstract necessary and sufficient conditions are stated: one associated with global checkpoint consistency, the other associated with message recording. A general protocol is derived from these abstract conditions. Actually, this general protocol can be instantiated in distinct ways, giving rise to a family of communication-induced snapshot protocols. This general protocol shows there is an intrinsic trade-off between the number of forced checkpoints and the number of recorded messages. Finally, a particular instantiation of the general protocol is provided.</p>
Asynchronous distributed computation, checkpointing, communication-induced protocol, consistency, global checkpoint, message recording, snapshot.
Jean-Michel Hélary, Achour Mostefaoui, Michel Raynal, "Communication-Induced Determination of Consistent Snapshots", IEEE Transactions on Parallel & Distributed Systems, vol. 10, no. , pp. 865-877, September 1999, doi:10.1109/71.798312