This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Necessary and Sufficient Conditions for Consistent Global Snapshots
February 1995 (vol. 6 no. 2)
pp. 165-169

Abstract—Consistent global snapshots are important in many distributed applications. We prove the exact conditions for an arbitrary checkpoint, or a set of checkpoints, to belong to a consistent global snapshot, a previously open problem. To describe the conditions, we introduce a generalization of Lamport's happened-before relation called a zigzag path.

Index Terms—Causality, global checkpoints, distributed systems, consistent global states, Lamport's happened-before relation.

[1] B. Bhargava and S. R. Lian,“Independent checkpointing and concurrent rollback for recovery—An optimistic approach,”inProc. IEEE Symp. Reliable Distrib. Syst., 1988, pp. 3–12.
[2] K. M. Chandy and C. V. Ramamoorthy,“Rollback and recovery strategies for computer programs,”IEEE Trans. Comput.,vol. 21, pp. 546–556, June 1972.
[3] D.R. Jefferson, "Virtual Time," ACM Trans. Programming Languages and Systems, vol. 7, no. 3, pp. 404-425, July 1985.
[4] R. Koo and S. Toueg, "Checkpointing and Rollback-Recovery for Distributed Systems," IEEE Trans. Software Eng., vol. 13, no. 1, pp. 23-31, Jan. 1987.
[5] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[6] R.H.B. Netzer and J. Xu, "Adaptive Message Logging for Incremental Replay of Message-Passing Programs," Proc. Supercomputing 1993, pp. 840-849,Portland, Ore., 1993.
[7] B. Randell,“System structure for software fault tolerance,”IEEE Trans. Software Eng., vol. SE-1, no. 2, pp. 220–232, June 1975.
[8] Y. M. Wang, A. Lowry, and W. K. Fuchs,“Consistent global checkpoints based on direct dependency tracking,”Res. Rep. RC 18465, IBM T. J. Watson Research Center, Yorktown Heights, NY, Oct. 1992.
[9] Y. M. Wang and W. K. Fuchs,“Optimistic message logging for independent checkpointing in message-passing systems,”inProc. 11th IEEE Symp. Reliable Distrib. Syst., Oct. 1992, pp. 147–154.
[10] L. D. Wittie,“Debugging distributed C programs by real time replay,”SIGPLAN/SIGOPS Workshop Parallel, Distrib. DebuggingMadison, WI, May 1988, pp. 57–67.
[11] J. Xu and R. H. B. Netzer,“Adaptive independent checkpointing for reducing rollback propagation,”inProc. 5th IEEE Symp. Parallel, Distrib. Processing, Dec. 1993, pp. 754–761.

Citation:
Robert H. B. Netzer, Jian Xu, "Necessary and Sufficient Conditions for Consistent Global Snapshots," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, pp. 165-169, Feb. 1995, doi:10.1109/71.342127
Usage of this product signifies your acceptance of the Terms of Use.