This Article 
 Bibliographic References 
 Add to: 
Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems
April 1988 (vol. 37 no. 4)
pp. 491-496
A numerical approach for computing optimal dynamic checkpointing strategies for general rollback and recovery systems is presented. The system is modeled as a Markov renewal decision process. General failure distributions, random checkpointing durations, and reprocessing-dependent recovery times are allowed. The aim is to find a dynamic decision rule to maximize the average system availability

[1] F. Bacelli, "Analysis of a service facility with periodic checkpointing,"Acta Informatica, vol. 15, pp. 67-81, Jan. 1981.
[2] D. P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models. Englewood Cliffs, NJ: Prentice-Hall, 1987.
[3] X. Castillo, S. R. McConnel, and D. P. Siewiorek, "Derivation and calibration of a transient error reliability model,"IEEE Trans. Comput., vol. C-31, pp. 658-671, July 1982.
[4] K. M. Chandy, "A survey of analytic models of rollback and recovery strategies,"Computer, vol. 8, pp. 40-47, May 1975.
[5] L. H. Crow and N. D. Singpurwalla, "An empirically developed Fourier series model for describing software failures,"IEEE Trans. Reliability, vol. R-33, pp. 176-183, June 1984.
[6] C.J. Date,An Introduction to Database Systems, Vol. II, Addison-Wesley Publishing Co., Reading, Mass., 1983.
[7] C. DeBoor,A Practical Guide to Splines. Berlin, Germany: Springer-Verlag, 1978.
[8] E. Gelenbe, "On the optimum checkpoint interval,"J. ACM, vol. 26, no. 2, pp. 259-270, 1979.
[9] E. Gelenbe and D. Derochette, "Performance of rollback recovery systems under intermittent failures,"Commun. ACM, vol. 21, no. 6, pp. 493-499, 1978.
[10] A. Haurie and P. L'Ecuyer, "Approximation and bounds in discrete event dynamic programming,"IEEE Trans. Automat. Contr., vol. AC-31, pp. 227-235, Mar. 1986.
[11] R. Koo and S. Toueg, "Checkpointing and rollback-recovery for distributed systems,"IEEE Trans. Software Eng., vol. SE-13, pp. 23-31, Jan. 1987.
[12] J. Koren, Z. Koren, and S.Su, "Analysis of a class of recovery procedures,"IEEE Trans. Comput., vol. C-35, pp. 703-712, 1986.
[13] C. M. Krishna, K. G. Shin, and Y.-H. Lee, "Optimization criteria for checkpoint placements,"Commun. ACM, vol. 27, no. 10, pp. 1008-1012, Oct. 1984.
[14] J. M. Magazine, "Optimality of intuitive checkpointing policies,"Inform. Processing Lett., vol. 17, pp. 63-66, Aug. 1983.
[15] J. Malenfant, "Modélisation du rétablissement lors des pannes dans les bases de données par des processus de renouvellement markoviens commandés," Rep. DIUL-RR-8701, Dép. d'informatique, Univ. Laval, Jan. 1987.
[16] V. F. Nicola and F. J. Kylstra, "A model of checkpointing and recovery with a specified number of transactions between checkpoints," inPerformance '83, A. K. Agrawala and S. K. Tripathi, Eds. Amsterdam, The Netherlands: North-Holland, 1983, pp. 83- 100.
[17] E. L. Porteus and J. C. Totten, "Accelerated computation of the expected discounted return in a Markov chain,"Oper. Res., vol. 26, pp. 350-358, 1978.
[18] P. J. Schweitzer, "Iterative solution of the functional equations of undiscounted Markov renewal programming,"J. Math. Anal. Appl., vol. 34, pp. 495-501, 1971.
[19] K. G. Shin, T. Lin, and Y.-H. Lee, "Optimal checkpointing of real-time tasks,"IEEE Trans. Computers, vol. C-36, no. 11, pp. 1328-1341, Nov. 1987.
[20] A. N. Tantawi and M. Ruschitzka, "Performance analysis of checkpointing strategies,"ACM Trans. Comput. Syst., vol. 2, no. 2, pp. 123-144, 1984.
[21] S. Toueg and O. Babaoglu, "On the optimum checkpoint selection problem,"SIAM J. Comput., vol. 13, no. 3, pp. 630-649, 1984.
[22] K. S. Trivedi,Probability and Statistics with Reliability, Queueing and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.

Index Terms:
general failure distributions; optimal checkpointing strategies; rollback and recovery systems; numerical approach; Markov renewal decision process; dynamic decision rule; value-iteration stochastic dynamic programming; finite-element approximation; decision theory; dynamic programming; Markov processes; performance evaluation.
P. L'Ecuyer, J. Malenfant, "Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems," IEEE Transactions on Computers, vol. 37, no. 4, pp. 491-496, April 1988, doi:10.1109/12.2197
Usage of this product signifies your acceptance of the Terms of Use.