This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme
August 1997 (vol. 46 no. 8)
pp. 942-947

Abstract—Checkpointing reduces loss of computation in the presence of failures. Two metrics characterize a checkpointing scheme: checkpoint overhead and checkpoint latency. This paper shows that a large increase in latency is acceptable if it is accompanied by a relatively small reduction in overhead. Also, for equidistant checkpoints, optimal checkpoint interval is shown to be typically independent of checkpoint latency.

[1] K.M. Chandy, J.C. Browne, C.W. Dissly, and W.R. Uhrig, "Analytic Models for Rollback and Recovery Strategies in Data Base Systems," IEEE Trans. Software Eng., vol. 1, pp. 100-110, Mar. 1975.
[2] E.G. Coffman and E.N. Gilbert, "Optimal Strategies for Scheduling Checkpoints and Preventive Maintenance," IEEE Trans. Reliability, vol. 39, pp. 9-18, Apr. 1990.
[3] A. Duda, "The Effects of Checkpointing on Program Execution Time," Information Processing Letters, vol. 16, pp. 221-229, June 1983.
[4] R. Geist, R. Reynolds, and J. Westall, "Selection of a Checkpoint Interval in a Critical-Task Environment," IEEE Trans. Reliability, vol. 37, pp. 395-400, Oct. 1988.
[5] E. Gelenbe, "On the Optimum Checkpoint Interval," J. ACM, vol. 26, pp. 259-270, Apr. 1979.
[6] V.G. Kulkarni, V.F. Nicola, and K.S. Trivedi, "Effects of Checkpointing and Queueing on Program Performance," Comm. Statistical-Stochastic Models, vol. 4, no. 6, pp. 615-648, 1990.
[7] C.H.C. Leung and Q.H. Choo, "On the Execution of Large Batch Programs in Unreliable Computing Systems," IEEE Trans. Software Eng., vol. 10, pp. 444-450, July 1984.
[8] K. Li, J.F. Naughton, and J.S. Plank, "Low-Latency, Concurrent Checkpointing for Parallel Programs," IEEE Trans. Parallel and Distributed Systems, vol. 5, pp. 874-879, Aug. 1994.
[9] G.M. Lohman and J.A. Muckstadt, "Optimal Policy for Batch Operations: Backup, Checkpointing and Reorganization," ACM Trans. Database Systems, vol. 2, pp. 202-222, Sept. 1977.
[10] J.S. Plank, M. Beck, G. Kingsley, and K. Li, "Libckpt: Transparent Checkpointing under Unix," Proc. Usenix Winter 1995 Technical Conf.,New Orleans, Jan. 1995.
[11] K. Shin, T.-H. Lin, and Y.-H. Lee, "Optimal Checkpointing of Real-Time Tasks," IEEE Trans. Computers, vol. 36, no. 11, pp. 1,328-1,341, Nov. 1987.
[12] A.N. Tantawi and M. Ruschitzka, "Performance Analysis of Checkpointing Strategies," ACM Trans. Computer Systems, vol. 2, pp. 123-144, May 1984.
[13] S. Toueg and Ö. Babaoglu, "On the Optimum Checkpoint Selection Problem," SIAM J. Computing, vol. 13, pp. 630-649, Aug. 1984.
[14] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice Hall, 1982.
[15] N.H. Vaidya, "Another Two-Level Failure Recovery Scheme: Performance Impact of Checkpoint Placement and Checkpoint Latency," Technical Report 94-068, Computer Science Dept., Texas A&M Univ., College Station, Dec. 1994. (revised Jan. 1995).
[16] N.H. Vaidya, "On Checkpoint Latency," Technical Report 95-015, Computer Science Dept., Texas A&M Unive., College Station, Mar. 1995. Presented in part at Pacific Rim Int'l Conf. Fault-Tolerant Systems, Newport Beach, Calif., Dec. 1995.
[17] J.W. Young, "A First Order Approximation to the Optimum Checkpoint Interval," Comm. ACM, vol. 17, pp. 530-531, Sept. 1974.
[18] A. Ziv and J. Bruck, "Analysis of Checkpointing Schemes for Multiprocessor Systems," Technical Report RJ 9593, IBM Almaden Research Center, Nov. 1993.

Index Terms:
Checkpointing and rollback, checkpoint latency, checkpoint overhead, performance analysis.
Citation:
Nitin H. Vaidya, "Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme," IEEE Transactions on Computers, vol. 46, no. 8, pp. 942-947, Aug. 1997, doi:10.1109/12.609281
Usage of this product signifies your acceptance of the Terms of Use.