|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yi-Min Wang, "Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints," IEEE Transactions on Computers, vol. 46, no. 4, pp. 456-468, April, 1997. | |||
| BibTex | x | ||
| @article{ 10.1109/12.588059, author = {Yi-Min Wang}, title = {Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints}, journal ={IEEE Transactions on Computers}, volume = {46}, number = {4}, issn = {0018-9340}, year = {1997}, pages = {456-468}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.588059}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints IS - 4 SN - 0018-9340 SP456 EP468 EPD - 456-468 A1 - Yi-Min Wang, PY - 1997 KW - Algorithms KW - distributed systems KW - consistent global states KW - distributed debugging KW - deadlock recovery KW - fault tolerance KW - checkpointing KW - rollback recovery KW - message logging KW - vector timestamps. VL - 46 JA - IEEE Transactions on Computers ER - | |||
Abstract—In this paper, we consider the problem of constructing consistent global checkpoints that contain a given set of checkpoints. We address three important issues related to this problem. First, we define the maximum and minimum consistent global checkpoints containing a set
[1] Y. M. Wang, A. Lowry, and W. K. Fuchs,“Consistent global checkpoints based on direct dependency tracking,”to appear inInform. Process. Lett., vol. 50, no. 4, pp. 223–230, May 1994.
[2] K.M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems," ACM Trans. Computer Systems, Feb. 1985.
[3] B. Bhargava and S.R. Lian, "Independent Checkpointing and Concurrent Rollback for Recovery—An Optimistic Approach," Proc. IEEE Symp. Reliable Distributed Systems, pp. 3-12, 1988.
[4] Y. M. Wang,“Space reclamation for uncoordinated checkpointing in message-passing systems,”Ph.D. dissertation, Dep. Elec. Comput. Eng., Univ. Illinois at Urbana-Champaign, Aug. 1993.
[5] D. B. Johnson and W. Zwaenepoel,“Recovery in distributed systems using optimistic message logging and checkpointing,”J. Algorithms, vol. 11, pp. 462–491, 1990.
[6] V. Hadzilacos, "An Algorithm for Minimizing Roll Back Cost," Proc. ACM Symp. Principles of Database Systems, pp. 93-97, 1982.
[7] Y. Wang, "Maximum and Minimum Consistent Global Checkpoints and Their Application," Proc. 14th IEEE Symp. Reliable Distributed Systems, pp. 86-95, Oct. 1995.
[8] Y.M. Wang, M. Merritt, and A.B. Romanovsky, "Guaranteed Deadlock Recovery: Deadlock Resolution with Rollback Propagation," Proc. Pacific Rim Int'l Symp. Fault-Tolerant Systems, pp. 92-97, Dec. 1995.
[9] A. Acharya and B.R. Badrinath, "Checkpointing Distributed Applications on Mobil Computers," Proc. Third Int'l Conf. Parallel and Distributed Information Systems, Sept. 1994.
[10] J. Fowler and W. Zwaenepoel, "Causal Distributed Breakpoints," Proc. 10th Int'l Conf. Distributed Computing Systems, pp. 134-141, 1990.
[11] A. P. Sistla and J. L. Welch,“Efficient distributed recovery using message logging,”inProc. 8th ACM Symp. Princip. Distrib. Comput., 1989, pp. 223–238.
[12] R.E. Strom, D.F. Bacon, and S.A. Yemini, “Volatile Logging inn-Fault-Tolerant Distributed Systems,” Proc. Third Ann. Int'l Symp. Fault-Tolerant Computing, pp. 44-49, 1988.
[13] T.Y. Juang and S. Venkatesan, “Crash Recovery with Little Overhead,” Proc. 11th Int'l Conf. Distributed Computing Systems, pp. 454-461, June 1987.
[14] E.N. Elnozahy and W. Zwaenepoel, “Manetho—Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit,” IEEE Trans. Computers, vol. 41, no. 5, pp. 526–531, May 1992.
[15] L. Alvisi and K. Marzullo, “Message Logging: Pessimistic, Optimistic and Causal,” Proc. 15th Int'l Conf. Distributed Computing Systems, pp. 229-236, 1995.
[16] R.H.B. Netzer and J. Xu, "Necessary and Sufficient Conditions for Consistent Global Snapshots," IEEE Trans. Parallel and Distributed System, vol. 6, no. 2, pp. 165-169, Feb. 1995.
[17] Y. Huang and C. Kintala, "Software Implemented Fault Tolerance: Technologies and Experience," Proc. IEEE Fault-Tolerant Computing Symp., pp. 2-9, June 1993.
[18] I. Anderson, Combinatorics of Finite Sets.Oxford: Clarendon Press, 1987.
[19] Y.M. Wang, P.Y. Chung, I.J. Lin, and W.K. Fuchs, "Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 5, pp. 546-554, May 1995.
[20] R.E. Strom and S.A. Yemini, "Optimistic Recovery in Distributed Systems," ACM Trans. Computer Systems, vol. 3, no. 3, pp. 204-226, Aug. 1985.
[21] K.H. Kim, J.H. You, and A. Abouelnaga, "A Scheme for Coordinated Execution of Independently Designed Recoverable Distributed Processes," Proc. IEEE Fault-Tolerant Computing Symp., pp. 130-135, 1986.
[22] D.L. Russell, "State Restoration in Systems of Communicating Processes," IEEE Trans. Software Eng., vol. 6, no. 2, pp. 183-194, Mar. 1980.
[23] K.L. Wu, W.K. Fuchs, and J.H. Patel, "Error Recovery in Shared Memory Multiprocessors Using Private Caches," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 231-240, Apr. 1990.
[24] K.-L. Wu and W.K. Fuchs, "Recoverable Distributed Shared Virtual Memory," IEEE Trans. Computers, vol. 39, no. 4, pp. 460-469, Apr. 1990.
[25] Y.M. Wang, Y. Huang, and W.K. Fuchs, "Progressive Retry for Software Error Recovery in Distributed Systems," Proc. IEEE Fault Tolerant Computing Symp., pp. 138-144, June 1993.
[26] Y. Huang, C. Kintala, and Y.M. Wang, "Software Tools and Libraries for Fault Tolerance," Bulletin Technical Committee on Operating Systems and Application Environments (TCOS), vol. 7, no. 4, pp. 5-9, Winter 1995.
[27] Y. Huang and Y.M. Wang, "Why Optimistic Message Logging Has Not Been Used in Telecommunications Systems," Proc. IEEE Fault-Tolerant Computing Symp., pp. 459-463, June 1995.
[28] E. Cohen, Y.M. Wang, and G. Suri, "When Piecewise Determinism Is Almost True," Proc. Pacific Rim Int'l Symp. Fault-Tolerant Systems, pp. 66-71, Dec. 1995.
[29] J. Gray and D.P. Siewiorek, "High-Availability Computer Systems," Computer, pp. 39-48, Sept. 1991.
[30] I. Lee and R.K. Iyer, “Faults, Symptoms, and Software Fault Tolerance in Tandem GUARDIAN90 Operating System,” Proc. 23rd IEEE Int'l Symp. Fault-Tolerant Computing (FTCS23), pp. 20-29, Toulouse, France 1993.
[31] Y. Huang and C. Kintala, "A Software Fault Tolerance Platform," Practical Reusable Software, B. Krishnamurthy, ed., pp. 223-245. John Wiley&Sons, 1995.
[32] Y. Huang and C. Kintala, "A Software Fault Tolerance Platform," Practical Reusable Software, B. Krishnamurthy, ed., pp. 223-245. John Wiley&Sons, 1995.
[33] G. Suri, Y. Huang, Y.M. Wang, W.K. Fuchs, and C. Kintala, "An Implementation and Performance Measurement of the Progressive Retry Technique," Proc. IEEE Int'l Computer Performance and Dependability Symp., pp. 41-48, Apr. 1995.
[34] Y. Huang, C. Kintala, L. Bernstein, and Y.M. Wang, "Components for Software Fault Tolerance and Rejuvenation," AT&T Technical J., pp. 29-37, Mar. 1996.
[35] Y.M. Wang et al., “Checkpointing and Its Applications,” Digest 25th Ann. Int'l Symp. Fault-Tolerant Computing, pp. 22-31, June 1995.
[36] R. E. Strom, S. A. Yemini, and D. F. Bacon, "A Recoverable Object Store," Proc. Hawaii Int'l Conf. System Sciences, pp. II-215-II-221, Jan. 1988.
[37] E. Knapp, "Deadlock Detection in Distributed databases Systems," ACM Computing Surveys, pp. 303-328, Dec. 1987.
[38] T. Imielinski and B.R. Badrinath, “Wireless Computing: Challenges in Data Management,” Comm. ACM, vol. 37, no. 10, Oct. 1994.

