This Article 
 Bibliographic References 
 Add to: 
The Completion Time of Programs on Processors Subject to Failure and Repair
October 1993 (vol. 42 no. 10)
pp. 1184-1194

The authors describe a technique for computing the distribution of the completion time of a program on a server subject to failure and repair. Several realistic aspects of the system are included in the model. The server behavior is modeled by a semi-Markov process in order to accommodate nonexponential repair-time distributions. More importantly, the effect on the job completion time of the work lost due to the occurrence of a server failure is modeled. They derive a closed-form expression for the Laplace-Stieltjes transform (LST) of the time to completion distribution of programs on such systems. They then describe an effective numerical procedure for computing the completion time distribution. They show how these results apply to the analysis of different computer system structures and organizations of fault-tolerant systems. Finally, they use numerical solution methods to find the distribution of time to completion on several systems.

[1] X. Castillo and D. P. Siewiorek, "A performance-reliability model for computing systems," inProc. FTCS-10. Silver Spring, MD: IEEE Computer Society, 1980, pp. 187-192.
[2] P. F. Chimento, "System performance in a failure prone environment," Ph.D. dissertation, Duke Univ., Durham, NC, 1988.
[3] P. J. Davis and Philip Rabinowitz,Methods of Numerical Integration(Computer Science and Applied Mathematics). New York: Academic, 1975.
[4] A. Duda, "The effects of checkpointing on program execution time,"Inform. Processing Lett., vol. 16, pp. 221-229, 1983.
[5] H. Garcia-Molina and J. Kent, "Evaulating response time in a faulty distributed computer,"IEEE Trans. Comput., vol. C-34, no. 2, pp. 101-109, Feb. 1985.
[6] R. Geist, R. Reynolds, and T. Westal, "Selection of check-point interval in critical task environment,"IEEE Trans. Rel., vol. 37, no. 4, pp. 395-400, 1988.
[7] E. Gelenbe and I. Mitrani,Analysis and Synthesis of Computer Systems. New York: Academic, 1980.
[8] M. Y. Hsiao, W. C. Carter, J. W. Thomas, and W. R. Stringfellow, "Reliability, availability, and serviceability of IBM computer systems: A quarter century of progress,"IBM J. Res. Develop., vol. 25, no. 5, pp. 453-465, Sept. 1981.
[9] M. C. Hsueh, R. K. Iyer, and K. S. Trivedi, "Performability modeling based on real data: A case study,"IEEE Trans. Comput., vol. 37, no. 4, pp. 478-484, Apr. 1988.
[10] D. L. Jagerman, "An inversion technique for the Laplace transform with application to approximation,"Bell Syst. Tech. J., vol. 57, no. 3, pp. 669-710, Mar. 1978.
[11] D. L. Jagerman, "An inversion technique for the Laplace transform,"Bell Syst. Tech. J., vol. 61, no. 8, pp. 1995-2002, Oct. 1982.
[12] V. G. Kulkarni, V. F. Nicola, and K. S. Trivedi, "The completion time of a job on multi-mode systems,"Advances Applied Probability, vol. 19, no. 4, pp. 932-954, Dec. 1987.
[13] V. G. Kulkarni, V. F. Nicola, and K. S. Trivedi, "Effects of checkpointing and queqeing on program performance,"Stochastic Models, vol. 6, no. 4, pp. 615-648, 1990.
[14] J. C. Laprie, "Dependable computing and fault tolerance: basic concepts and terminology," inProc. 15th Int. IEEE Symp. on Fault Tolerant Computing (FTCS-15)(Ann Arbor, MI), June 1985, pp. 2-11.
[15] J. N. Lyness, "Algorithm 397, SQUANK,"Collected Algorithms ACM, 1969.
[16] R. Marie and K. S. Trivedi, "A note on the effect of preemptive policies on the stability of a priority queue,"Inform. Processing Lett., vol. 24, no. 6, pp. 397-401, Apr. 1987.
[17] J. F. Meyer, "On evaluating the performability of degradable computing systems,"IEEE Trans. Comput., vol. C-29, no. 8, pp. 720-731, Aug. 1980.
[18] J. F. Meyer, "Closed-form solutions of performability,"IEEE Trans. Comput., vol. C-31, no. 7, pp. 648-657, July 1982.
[19] V. F. Nicola, V. G. Kulkarni, and K. S. Trivedi, "A queueing analysis of fault-tolerant computer systems,"IEEE Trans. Software Eng., vol. SE-13, no. 3, pp. 363-375, Mar. 1987.
[20] W. C. Obi, "Error analysis of a Laplace transform inversion procedure,"SIAM J. Numerical Analysis, vol. 27, no. 2, pp. 457-469, Apr. 1990.
[21] R. Pyke, "Markov renewal processes: Definitions and preliminary properties,"Annals Math. Statistics, vol. 32, pp. 1231-1242, 1961.
[22] H. L. Royden,Real Analysis, 2nd ed. New York: Macmillan, 1968.

Index Terms:
server behavior; semi-Markov process; nonexponential repair-time distributions; job completion time; closed-form expression; Laplace-Stieltjes transform; fault-tolerant system; computer performance; failure-repair models; multistate computer systems; preemptions; fault tolerant computing; Laplace transforms; Markov processes; performance evaluation.
P.F. Chimento, K.S. Trivedi, "The Completion Time of Programs on Processors Subject to Failure and Repair," IEEE Transactions on Computers, vol. 42, no. 10, pp. 1184-1194, Oct. 1993, doi:10.1109/12.257705
Usage of this product signifies your acceptance of the Terms of Use.