This Article 
 Bibliographic References 
 Add to: 
Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle
April-June 2006 (vol. 3 no. 2)
pp. 130-140
Tadashi Dohi, IEEE Computer Society
Hiroyuki Okamura, IEEE Computer Society
Naoto Kaio, IEEE
In this paper, we consider two kinds of sequential checkpoint placement problems with infinite/finite time horizon. For these problems, we apply approximation methods based on the variational principle and develop computation algorithms to derive the optimal checkpoint sequence approximately. Next, we focus on the situation where the knowledge on system failure is incomplete, i.e., the system failure time distribution is unknown. We develop the so-called min-max checkpoint placement methods to determine the optimal checkpoint sequence under an uncertain circumstance in terms of the system failure time distribution. In numerical examples, we investigate quantitatively the proposed distribution-free checkpoint placement methods, and refer to their potential applicability in practice.

[1] V.F. Nicola, Checkpointing and Modeling of Program Execution Time, pp. 167-188. New York: John Wiley & Sons, 1995.
[2] J.W. Young, “A First Order Approximation to the Optimum Checkpoint Interval,” Comm. ACM, vol. 17, no. 9, pp. 530-531, 1974.
[3] F. Baccelli, “Analysis of S Service Facility with Periodic Checkpointing,” Acta Informatica, vol. 15, pp. 67-81, 1981.
[4] K.M. Chandy, “A Survey of Analytic Models of Roll-Back and Recovery Strategies,” Computer, vol. 8, no. 5, pp. 40-47, 1975.
[5] K.M. Chandy, J.C. Browne, C.W. Dissly, and W.R. Uhrig, “Analytic Models for Rollback and Recovery Strategies in Database Systems,” IEEE Trans. Software Eng., vol. 1, no. 1, pp. 100-110, 1975.
[6] T. Dohi, N. Kaio, and K.S. Trivedi, “Availability Models with Age Dependent-Checkpointing,” Proc. 21st Symp. Reliable Distributed Systems, pp. 130-139, 2002.
[7] E. Gelenbe and D. Derochette, “Performance of Rollback Recovery Systems under Intermittent Failures,” Comm. ACM, vol. 21, no. 6, pp. 493-499, 1978.
[8] E. Gelenbe, “On the Optimum Checkpoint Interval,” J. ACM, vol. 26, no. 2, pp. 259-270, 1979.
[9] E. Gelenbe and M. Hernandez, “Optimum Checkpoints with Age Dependent Failures,” Acta Informatica, vol. 27, pp. 519-531, 1990.
[10] P.B. Goes and U. Sumita, “Stochastic Models for Performance Analysis of Database Recovery Control,” IEEE Trans. Computers, vol. 44, no. 4, pp. 561-576, Apr. 1995.
[11] V. Grassi, L. Donatiello, and S. Tucci, “On the Optimal Checkpointing of Critical Tasks and Transaction-Oriented Systems,” IEEE Trans. Software Eng., vol. 18, no. 1, pp. 72-77, Jan. 1992.
[12] V.G. Kulkarni, V.F. Nicola, and K.S. Trivedi, “Effects of Checkpointing and Queueing on Program Performance,” Stochastic Models, vol. 6, no. 4, pp. 615-648, 1990.
[13] V.F. Nicola and J.M. Van Spanje, “Comparative Analysis of Different Models of Checkpointing and Recovery,” IEEE Trans. Software Eng., vol. 16, no. 8, pp. 807-821, Aug. 1990.
[14] U. Sumita, N. Kaio, and P.B. Goes, “Analysis of Effective Service Time with Age Dependent Interruptions and Its Application to Optimal Rollback Policy for Database Management,” Queueing Systems, vol. 4, pp. 193-212, 1989.
[15] P. L'Ecuyer and J. Malenfant, “Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems,” IEEE Trans. Computers, vol. 37, no. 4, pp. 491-496, Apr. 1988.
[16] A. Ziv and J. Bruck, “An On-Line Algorithm for Checkpoint Placement,” IEEE Trans. Computers, vol. 46, no. 9, pp. 976-985, Sept. 1997.
[17] N.H. Vaidya, “Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme,” IEEE Trans. Computers, vol. 46, no. 8, pp. 942-947, Aug. 1997.
[18] H. Okamura, Y. Nishimura, and T. Dohi, “A Dynamic Checkpointing Scheme Based on Reinforcement Learning,” Proc. 2004 Pacific Rim Int'l Symp. Dependable Computing, pp. 151-158, 2004.
[19] S. Toueg and Ö. Babaoglu, “On the Optimum Checkpoint Selection Problem,” SIAM J. Computing, vol. 13, no. 3, pp. 630-649, 1984.
[20] Y. Ling, J. Mi, and X. Lin, “A Variational Calculus Approach to Optimal Checkpoint Placement,” IEEE Trans. Computers, vol. 50, no. 7, pp. 699-707, July 2001.
[21] S. Fukumoto, N. Kaio, and S. Osaki, “A Study of Checkpoint Generations for a Database Recovery Mechanism,” Computers Math. Applications, vol. 24, pp. 63-70, 1992.
[22] S. Fukumoto, N. Kaio, and S. Osaki, “Optimal Checkpointing Strategies using the Checkpointing Density,” J. Information Processing, vol. 15, pp. 87-92, 1992.
[23] V. Castelli, R.E. Harper, P. Heidelberger, S.W. Hunter, K.S. Trivedi, K. Vaidyanathan, and W.P. Zeggert, “Proactive Management of Software Aging,” IBM J. Research & Development, vol. 45, pp. 311-332, 2001.
[24] T. Dohi, N. Kaio, and S. Osaki, “Optimal Checkpointing and Rollback Strategies with Media Failures: Statistical Estimation Algorithms,” Proc. 1999 Pacific Rim Int'l Symp. Dependable Computing, pp. 161-168, 1999.
[25] R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. Cambridge, Mass.: MIT Press, 1998.
[26] Y.Y. Barzilovich, V.A. Kashtanov, and I.N. Kovalenks, “On Minimax Criteria in Reliability Problems,” Eng. Cybernetics, vol. 6, pp. 467-477, 1971.
[27] C. Derman, “On Minimax Surveillance Schedules,” Naval Research Logistics Quarterly, vol. 8, pp. 415-419, 1961.
[28] R.E. Barlow and F. Proschan, Mathematical Theory of Reliability. Philadelpiah: SIAM, 1996.
[29] P.B. Goes, “A Stochastic Model for Performance Evaluation of Main Memory Resident Database Systems,” ORSA J. Computing, vol. 7, no. 3, pp. 269-282, 1997.
[30] T. Ozaki, T. Dohi, H. Okamura, and N. Kaio, “Min-Max Checkpoint Placement under Incomplete Information,” Proc. 2004 Int'l Conf. Dependable Systems and Networks, pp. 721-730, 2004.

Index Terms:
Checkpoint/restart, fault-tolerance, high availability, modeling and prediction, performance evaluation, maintenance, incomplete failure information.
Tatsuya Ozaki, Tadashi Dohi, Hiroyuki Okamura, Naoto Kaio, "Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle," IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 2, pp. 130-140, April-June 2006, doi:10.1109/TDSC.2006.22
Usage of this product signifies your acceptance of the Terms of Use.