This Article 
 Bibliographic References 
 Add to: 
Necessary and Sufficient Conditions for Consistent Global Snapshots
February 1995 (vol. 6 no. 2)
pp. 165-169

Abstract—Consistent global snapshots are important in many distributed applications. We prove the exact conditions for an arbitrary checkpoint, or a set of checkpoints, to belong to a consistent global snapshot, a previously open problem. To describe the conditions, we introduce a generalization of Lamport's happened-before relation called a zigzag path.

Index Terms—Causality, global checkpoints, distributed systems, consistent global states, Lamport's happened-before relation.

[1] B. Bhargava and S. R. Lian,“Independent checkpointing and concurrent rollback for recovery—An optimistic approach,”inProc. IEEE Symp. Reliable Distrib. Syst., 1988, pp. 3–12.
[2] K. M. Chandy and C. V. Ramamoorthy,“Rollback and recovery strategies for computer programs,”IEEE Trans. Comput.,vol. 21, pp. 546–556, June 1972.
[3] D.R. Jefferson, "Virtual Time," ACM Trans. Programming Languages and Systems, vol. 7, no. 3, pp. 404-425, July 1985.
[4] R. Koo and S. Toueg, "Checkpointing and Rollback-Recovery for Distributed Systems," IEEE Trans. Software Eng., vol. 13, no. 1, pp. 23-31, Jan. 1987.
[5] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[6] R.H.B. Netzer and J. Xu, "Adaptive Message Logging for Incremental Replay of Message-Passing Programs," Proc. Supercomputing 1993, pp. 840-849,Portland, Ore., 1993.
[7] B. Randell,“System structure for software fault tolerance,”IEEE Trans. Software Eng., vol. SE-1, no. 2, pp. 220–232, June 1975.
[8] Y. M. Wang, A. Lowry, and W. K. Fuchs,“Consistent global checkpoints based on direct dependency tracking,”Res. Rep. RC 18465, IBM T. J. Watson Research Center, Yorktown Heights, NY, Oct. 1992.
[9] Y. M. Wang and W. K. Fuchs,“Optimistic message logging for independent checkpointing in message-passing systems,”inProc. 11th IEEE Symp. Reliable Distrib. Syst., Oct. 1992, pp. 147–154.
[10] L. D. Wittie,“Debugging distributed C programs by real time replay,”SIGPLAN/SIGOPS Workshop Parallel, Distrib. DebuggingMadison, WI, May 1988, pp. 57–67.
[11] J. Xu and R. H. B. Netzer,“Adaptive independent checkpointing for reducing rollback propagation,”inProc. 5th IEEE Symp. Parallel, Distrib. Processing, Dec. 1993, pp. 754–761.

Robert H. B. Netzer, Jian Xu, "Necessary and Sufficient Conditions for Consistent Global Snapshots," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, pp. 165-169, Feb. 1995, doi:10.1109/71.342127
Usage of this product signifies your acceptance of the Terms of Use.