|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yibei Ling, Jie Mi, Xiaola Lin, "A Variational Calculus Approach to Optimal Checkpoint Placement," IEEE Transactions on Computers, vol. 50, no. 7, pp. 699-708, July, 2001. | |||
| BibTex | x | ||
| @article{ 10.1109/12.936236, author = {Yibei Ling and Jie Mi and Xiaola Lin}, title = {A Variational Calculus Approach to Optimal Checkpoint Placement}, journal ={IEEE Transactions on Computers}, volume = {50}, number = {7}, issn = {0018-9340}, year = {2001}, pages = {699-708}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.936236}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - A Variational Calculus Approach to Optimal Checkpoint Placement IS - 7 SN - 0018-9340 SP699 EP708 EPD - 699-708 A1 - Yibei Ling, A1 - Jie Mi, A1 - Xiaola Lin, PY - 2001 KW - Aperiodic checkpointing KW - periodic checkpointing KW - system failure rate. VL - 50 JA - IEEE Transactions on Computers ER - | |||
Abstract—Checkpointing is an effective fault-tolerant technique for improving system availability and reliability. However, a blind checkpointing placement can result in either performance degradation or expensive recovery cost. By means of the calculus of variations, we derive an explicit formula that links the optimal checkpointing frequency with a general failure rate, with the objective of globally minimizing the total expected cost of checkpointing and recovery. Theoretical result shows that the optimal checkpointing frequency is proportional to the square root of the failure rate and can be uniquely determined by the failure rate (time-varying or constant) if the recovery function is strictly increasing and the failure rate is
[1] L.B. Boguslavsky, E.G. Coffman, E.N. Gilbert, and A.Y. Kreinin, “Scheduling Checks and Saves,” OOSA J. Computing, vol. 4, no. 1, pp. 60-69, 1992.
[2] J.L. Bruno and E.G. Coffman, “Optimal Fault-Tolerant Computing on Multiprocessor Systems,” Acta Informatica, vol. 34, pp. 881-904, 1997.
[3] J.L. Bruno, E.G. Coffman, J.C. Lagarias, T.J. Richardson, and P.W. Shor, “Processor Shadowing: Maximizing Expected Throughput in Fault-Tolerant Systems,” Math. Operations Research, vol. 24, no. 2, pp. 362-382, May 1999.
[4] K.M. Chandy, J.C. Browne, C.W. Dissly, and W.R. Uhrig, “Analytic Models for Rollback and Recovery Strategies in Database Systems,” IEEE Trans. Software Eng., vol. 1, no. 1, pp. 100-110, Mar. 1975.
[5] K.M. Chandy, “A Survey of Analytic Models for Rollback and Recovery Strategies,” Computer, vol. 8, no. 5, pp. 40-47, 1975.
[6] E.G. Coffman and E.N. Gilbert, "Optimal Strategies for Scheduling Checkpoints and Preventive Maintenance," IEEE Trans. Reliability, vol. 39, pp. 9-18, Apr. 1990.
[7] E.G. Coffman, L. Flatto, and P.E. Wright, “A Stochastic Checkpoint Optimization Problem,” SIAM J. Computing, vol. 22, no. 3, pp. 650-659, June 1993.
[8] A. Duda, “The Effects of Checkpointing on Program Execution Time,” Information Processing Letters, vol. 16, no. 5, pp. 221-229, June 1983.
[9] P. L’Ecuyer and J. Malenfant,“Computing optimal checkpointing strategies for rollback and recovery systems,” IEEE Trans. Computers, vol. 37, no. 4, pp. 491-496, 1988.
[10] E. Gelenbe and M. Hernandez, “Optimum Checkpoints with Age Dependent Failures,” Acta Informatica, vol. 27, pp. 519-531, 1990.
[11] A. Goyal, V.F. Nicola, A. Tantawi, and K. Trivedi, “Reliability of System with Limited Repairs,” IEEE Trans. Reliability, vol. 36, no. 2, pp. 202-207, 1987.
[12] V. Grassi,L. Donatiello,, and S. Tucci,“On the optimal checkpointing of critical tasks and transaction-oriented systems,” IEEE Trans. Software Eng., vol. 18, no. 1, pp. 72-77, 1992.
[13] C.M. Krishna, K.G. Shin, and Y.H. Lee, “Optimization Criteria for Checkpoint Placements,” Comm. ACM, vol. 27, no. 10, pp. 1008-1012, Oct. 1984.
[14] C.H.C. Leung and Q.H. Choo, “On the Execution of Large Batch Programs in Unreliable Computing Systems,” IEEE Trans. Software Eng., vol. 10, no. 4, pp. 444-450, July 1984.
[15] J. Mi, “Interval Estimation of Availability of a Series System,” IEEE Trans. Reliability, vol. 40, pp. 541-546, 1991.
[16] V.F. Nicola and J.M. van Spanje, "Comparative Analysis of Different Models of Checkpointing and Recovery," IEEE Trans. Software Eng., vol. 16, no. 8, pp. 807-821, Aug. 1990.
[17] V.F. Nicola, “Checkpointing and the Modeling of Program Execution Time,” Software Fault Tolerance, M.R. Lyu, ed., pp. 167-188, John Wiley&Sons, 1995.
[18] J.S. Plank, K. Li, and M.A. Puening, "Diskless Checkpointing," IEEE Trans Parallel and Distributed Systems, Vol. 9, No. 10, Oct. 1998, pp. 972-986.
[19] S.M. Ross, Stochastic Processes. New York: Wesley, 1996.
[20] K. Shin, T.-H. Lin, and Y.-H. Lee, "Optimal Checkpointing of Real-Time Tasks," IEEE Trans. Computers, vol. 36, no. 11, pp. 1,328-1,341, Nov. 1987.
[21] E. de Souza e Silva and H.R. Gail, “Calculating Cumulative Operational Time Distributions of Repairable Computer Systems,” IEEE Trans. Computers, vol. 35, no. 4, pp. 322-332, Apr. 1986.
[22] U. Sumita, N. Kaio, and P.B. Goes, “Analysis of Effective Service Time with Age Dependent Interruptions and Its Application to Optimal Rollback Policy for Database Management,” Queuing Systems: Theory and Applications, vol. 4, pp.193-212, 1989.
[23] A.N. Tantawi and M. Ruschitzka, "Performance Analysis of Checkpointing Strategies," ACM Trans. Computer Systems, vol. 2, pp. 123-144, May 1984.
[24] S. Toueg and Ö. Babaoglu, "On the Optimum Checkpoint Selection Problem," SIAM J. Computing, vol. 13, pp. 630-649, Aug. 1984.
[25] J.W. Young, "A First Order Approximation to the Optimum Checkpoint Interval," Comm. ACM, vol. 17, pp. 530-531, Sept. 1974.

