This Article 
 Bibliographic References 
 Add to: 
Recovery Point Selection on a Reverse Binary Tree Task Model
August 1989 (vol. 15 no. 8)
pp. 963-976

An analysis is conducted of the complexity of placing recovery points where the computation is modeled as a reverse binary tree task model. The objective is to minimize the expected computation time of a program in the presence of faults. The method can be extended to an arbitrary reverse tree model. For uniprocessor systems, an optimal placement algorithm is proposed. For multiprocessor systems, a procedure for computing their performance is described. Since no closed form solution is available, an alternative measurement is proposed that has a closed form formula. On the basis of this formula, algorithms are devised for solving the recovery point placement problem. The estimated formula can be extended to include communication delays where the algorithm devised still applies.

[1] D. P. Agrawalet al., "Evaluating the performance of multicomputer configurations,"IEEE Comput. Mag., vol. 19, pp. 23-37, May 1986.
[2] D. Briaticoet al., "A distributed domino-effect free recovery algorithm," inProc. Reliability in Distributed Software and Database Systems, 1984, pp. 207-215.
[3] K. M. Chandy and C. V. Ramamoorthy, "Roll back and recovery strategies for computer programs,"IEEE Trans. Comput., vol. C-21, no. 6, pp. 546-556, 1972.
[4] K. M. Chandy, J. C. Browne, C. W. Dissly, and W. R. Uhrig, "Analytic models for roll back and recovery strategies in database systems,"IEEE Trans. Software Eng., vol. SE-1, no. 1, pp. 100-110, Mar, 1975.
[5] K. M. Chandy, "A survey of analytical models of roll back and recovery strategies,"Computer, vol. 8, no. 5, pp. 40-47, May 1975.
[6] S. K. Chen, W. T. Tsai, and M. B. Thuraisingham, "On the optimal AT and RP assignment," Dep. Comput. Sci., Univ. Minnesota, Minneapolis, Tech. Rep. TR-86-47, 1986.
[7] E. Gelenbe, "On the optimum checkpoint interval,"J. ACM, vol. 26, no. 2, pp. 259-270, 1979.
[8] M. Gransky, I. Koren, and G. M. Silberman, "The effect of operation scheduling on the performance of a data flow computer,"IEEE Trans. Comput., vol. C-36, pp. 1019-1029, Sept. 1987.
[9] K. W. Hwang and W. T. Tsai, "Backward error recovery in distributed systems with reliable checkpoints," Dep. Comput. Sci., Univ. Minnesota, Minneapolis, Tech. Rep. TR-86-40, 1986.
[10] B. Indurkhya, H. S. Stone, and L. Xi-Cheng, "Optimal partitioning of randomly generated distributed programs,"IEEE Trans. Software Eng., vol. SE-12, pp. 483-495, Mar. 1986.
[11] S. H. Jauw, "Simulation of a distributed synchronous recovery scheme," M.S. thesis, Dep. Comput. Sci., Univ. Minnesota, Minneapolis, Dec. 1986.
[12] A. Kanekoet al., "Logical clock synchronization method for duplicated database control," inProc. Distributed Computing Systems, 1979, pp. 601-611.
[13] H. H. Kim, "An approach to programmer-transparent coordination of recovering parallel processes and its efficient implementation rules," inProc. Parallel Processing, 1978, pp. 58-68.
[14] K. H. Kim, "Approaches to mechanizations of the conversation scheme based on monitors,"IEEE Trans. Software Eng., vol. SE-8, no. 3, pp. 189-197, May 1982.
[15] K. H. Kimet al., "An analysis of the execution overhead inherent in the conversation scheme," inProc. Reliability in Distributed Software and Database Systems, 1986, pp. 159-168.
[16] K. H. Kimet al., "A scheme for coordinated execution of independently designed recoverable distributed processes," inProc. Fault-Tolerant Computing Systems, 1986, pp. 130-135.
[17] J. Koren, Z. Koren, and S.Su, "Analysis of a class of recovery procedures,"IEEE Trans. Comput., vol. C-35, pp. 703-712, 1986.
[18] C. M. Krishna, K. G. Shin, and Y.-H. Lee, "Optimization criteria for checkpoint placements,"Commun. ACM, vol. 27, no. 10, pp. 1008-1012, Oct. 1984.
[19] C. L. Liu,Introduction to Combinatorial Mathematics. New York: McGraw-Hill, 1968.
[20] B. Randell, "System structure for software fault-tolerance,"IEEE Trans. Software Eng., vol. SE-1, no. 2, pp. 220-232, June 1975.
[21] K. G. Shin and Y.-H. Lee, "Evaluation of error recovery blocks used for cooperating processes,"IEEE Trans. Software Eng., vol. SE-10, no. 6, pp. 692-700, Nov. 1984.
[22] K. G. Shin, T.-H. Lin, and Y.-H. Lee, "Optimal checkpointing of real-time tasks," inProc. Symp. on Reliability in Distributed Software and Database Systems, Jan. 1986, pp. 151-158.
[23] A. N. Tantawi and M. Ruschitzka, "Performance analysis of checkpointing strategies,"ACM Trans. Comput. Syst., vol. 2, no. 2, pp. 123-144, 1984.
[24] S. Toueg and O. Babaoglu, "On the optimum checkpoint selection problem,"SIAM J. Comput., vol. 13, no. 3, pp. 630-649, 1984.
[25] K. Tsuruoka, A. Kaneko, and Y. Nishihara, "Dynamic recovery schemes for distributed processes," inProc. Reliability in Distributed Software and Database Systems, 1981, pp. 124-130.
[26] W. G. Wood, "Recovery control of communicating processes in a distributed system," inReliable Computer Systems, S. K. Shrivastava, Ed. New York: Springer-Verlag, 1985.
[27] J. W. Young, "A first order approximation to the optimum checkpoint interval,"Commun. ACM, vol. 17, no. 9, pp. 530-531, 1974.

Index Terms:
performance computation procedure; computation time minimization; recovery point selection; reverse binary tree task model; arbitrary reverse tree model; uniprocessor systems; optimal placement algorithm; multiprocessor systems; closed form solution; closed form formula; recovery point placement problem; communication delays; computational complexity; fault tolerant computing; multiprocessing systems; trees (mathematics)
S.-K. Chen, W.T. Tsai, M.B. Thuraisingham, "Recovery Point Selection on a Reverse Binary Tree Task Model," IEEE Transactions on Software Engineering, vol. 15, no. 8, pp. 963-976, Aug. 1989, doi:10.1109/32.31353
Usage of this product signifies your acceptance of the Terms of Use.