
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Yibei Ling, Jie Mi, Xiaola Lin, "A Variational Calculus Approach to Optimal Checkpoint Placement," IEEE Transactions on Computers, vol. 50, no. 7, pp. 699708, July, 2001.  
BibTex  x  
@article{ 10.1109/12.936236, author = {Yibei Ling and Jie Mi and Xiaola Lin}, title = {A Variational Calculus Approach to Optimal Checkpoint Placement}, journal ={IEEE Transactions on Computers}, volume = {50}, number = {7}, issn = {00189340}, year = {2001}, pages = {699708}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.936236}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  A Variational Calculus Approach to Optimal Checkpoint Placement IS  7 SN  00189340 SP699 EP708 EPD  699708 A1  Yibei Ling, A1  Jie Mi, A1  Xiaola Lin, PY  2001 KW  Aperiodic checkpointing KW  periodic checkpointing KW  system failure rate. VL  50 JA  IEEE Transactions on Computers ER   
Abstract—Checkpointing is an effective faulttolerant technique for improving system availability and reliability. However, a blind checkpointing placement can result in either performance degradation or expensive recovery cost. By means of the calculus of variations, we derive an explicit formula that links the optimal checkpointing frequency with a general failure rate, with the objective of globally minimizing the total expected cost of checkpointing and recovery. Theoretical result shows that the optimal checkpointing frequency is proportional to the square root of the failure rate and can be uniquely determined by the failure rate (timevarying or constant) if the recovery function is strictly increasing and the failure rate is
[1] L.B. Boguslavsky, E.G. Coffman, E.N. Gilbert, and A.Y. Kreinin, “Scheduling Checks and Saves,” OOSA J. Computing, vol. 4, no. 1, pp. 6069, 1992.
[2] J.L. Bruno and E.G. Coffman, “Optimal FaultTolerant Computing on Multiprocessor Systems,” Acta Informatica, vol. 34, pp. 881904, 1997.
[3] J.L. Bruno, E.G. Coffman, J.C. Lagarias, T.J. Richardson, and P.W. Shor, “Processor Shadowing: Maximizing Expected Throughput in FaultTolerant Systems,” Math. Operations Research, vol. 24, no. 2, pp. 362382, May 1999.
[4] K.M. Chandy, J.C. Browne, C.W. Dissly, and W.R. Uhrig, “Analytic Models for Rollback and Recovery Strategies in Database Systems,” IEEE Trans. Software Eng., vol. 1, no. 1, pp. 100110, Mar. 1975.
[5] K.M. Chandy, “A Survey of Analytic Models for Rollback and Recovery Strategies,” Computer, vol. 8, no. 5, pp. 4047, 1975.
[6] E.G. Coffman and E.N. Gilbert, "Optimal Strategies for Scheduling Checkpoints and Preventive Maintenance," IEEE Trans. Reliability, vol. 39, pp. 918, Apr. 1990.
[7] E.G. Coffman, L. Flatto, and P.E. Wright, “A Stochastic Checkpoint Optimization Problem,” SIAM J. Computing, vol. 22, no. 3, pp. 650659, June 1993.
[8] A. Duda, “The Effects of Checkpointing on Program Execution Time,” Information Processing Letters, vol. 16, no. 5, pp. 221229, June 1983.
[9] P. L’Ecuyer and J. Malenfant,“Computing optimal checkpointing strategies for rollback and recovery systems,” IEEE Trans. Computers, vol. 37, no. 4, pp. 491496, 1988.
[10] E. Gelenbe and M. Hernandez, “Optimum Checkpoints with Age Dependent Failures,” Acta Informatica, vol. 27, pp. 519531, 1990.
[11] A. Goyal, V.F. Nicola, A. Tantawi, and K. Trivedi, “Reliability of System with Limited Repairs,” IEEE Trans. Reliability, vol. 36, no. 2, pp. 202207, 1987.
[12] V. Grassi,L. Donatiello,, and S. Tucci,“On the optimal checkpointing of critical tasks and transactionoriented systems,” IEEE Trans. Software Eng., vol. 18, no. 1, pp. 7277, 1992.
[13] C.M. Krishna, K.G. Shin, and Y.H. Lee, “Optimization Criteria for Checkpoint Placements,” Comm. ACM, vol. 27, no. 10, pp. 10081012, Oct. 1984.
[14] C.H.C. Leung and Q.H. Choo, “On the Execution of Large Batch Programs in Unreliable Computing Systems,” IEEE Trans. Software Eng., vol. 10, no. 4, pp. 444450, July 1984.
[15] J. Mi, “Interval Estimation of Availability of a Series System,” IEEE Trans. Reliability, vol. 40, pp. 541546, 1991.
[16] V.F. Nicola and J.M. van Spanje, "Comparative Analysis of Different Models of Checkpointing and Recovery," IEEE Trans. Software Eng., vol. 16, no. 8, pp. 807821, Aug. 1990.
[17] V.F. Nicola, “Checkpointing and the Modeling of Program Execution Time,” Software Fault Tolerance, M.R. Lyu, ed., pp. 167188, John Wiley&Sons, 1995.
[18] J.S. Plank, K. Li, and M.A. Puening, "Diskless Checkpointing," IEEE Trans Parallel and Distributed Systems, Vol. 9, No. 10, Oct. 1998, pp. 972986.
[19] S.M. Ross, Stochastic Processes. New York: Wesley, 1996.
[20] K. Shin, T.H. Lin, and Y.H. Lee, "Optimal Checkpointing of RealTime Tasks," IEEE Trans. Computers, vol. 36, no. 11, pp. 1,3281,341, Nov. 1987.
[21] E. de Souza e Silva and H.R. Gail, “Calculating Cumulative Operational Time Distributions of Repairable Computer Systems,” IEEE Trans. Computers, vol. 35, no. 4, pp. 322332, Apr. 1986.
[22] U. Sumita, N. Kaio, and P.B. Goes, “Analysis of Effective Service Time with Age Dependent Interruptions and Its Application to Optimal Rollback Policy for Database Management,” Queuing Systems: Theory and Applications, vol. 4, pp.193212, 1989.
[23] A.N. Tantawi and M. Ruschitzka, "Performance Analysis of Checkpointing Strategies," ACM Trans. Computer Systems, vol. 2, pp. 123144, May 1984.
[24] S. Toueg and Ö. Babaoglu, "On the Optimum Checkpoint Selection Problem," SIAM J. Computing, vol. 13, pp. 630649, Aug. 1984.
[25] J.W. Young, "A First Order Approximation to the Optimum Checkpoint Interval," Comm. ACM, vol. 17, pp. 530531, Sept. 1974.