This Article 
 Bibliographic References 
 Add to: 
Fast Simulation of Highly Dependable Systems with General Failure and Repair Processes
December 1993 (vol. 42 no. 12)
pp. 1440-1452

An approach for simulating models of highly dependable systems with general failure and repair time distribution is described. The approach combines importance sampling with event rescheduling in order to obtain variance reductions in such rare event simulations. The approach is general in nature and allows a variety of features commonly arising in dependability modeling to be simulated effectively. It is shown how the technique can be applied to systems with redundant components and/or periodic maintenance. For different failure time distributions, the effect of the maintenance period on the steady-state availability is explored. The amount of component redundancy needed to achieve a certain reliability level is determined.

[1] E. Cinlar,An Introduction to Stochastic Processes. Englewood Cliffs, NJ: Prentice-Hall, 1975.
[2] A. E. Conway and A. Goyal, "Monte Carlo simulation of computer system availability/reliability models," inProc. 17th Symp. Fault-Tolerant Computing, Pittsburgh, PA, 1987, pp. 230-235.
[3] L. Devroye,Non-uniform Random Variate Generation. New York: Springer-Verlag, 1986.
[4] J. B. Dugan, K. S. Trivedi, M. K. Smotherman, and R. M. Geist, "The hybrid automated reliability predictor,"J. Guidance, Control, and Dynamics, vol. 3, no. 9, pp. 319-331, 1986.
[5] R. M. Geist and M. K. Smotherman, "Ultrahigh reliability estimates through simulation," inProc. Annu. Reliability and Maintainability Symp., Atlanta, GA, 1989, pp. 350-355.
[6] R. M. Geist and K. S. Trivedi, "Ultra-high reliability prediction for fault-tolerant computer systems,"IEEE Trans. Comput., vol. C-32, pp. 1118-1127, 1983.
[7] P. W. Glynn, "A GSMP formalism for discrete event systems,"Proc. IEEE, vol. 77, no. 1, pp. 14-23, 1989.
[8] P. W. Glynn and D. L. Iglehart, "Importance sampling for stochastic simulations,"Management Sci., vol. 35, no. 11, pp. 1367-1392, 1989.
[9] A. Goyal, W. C. Carter, E. de Souza e Silva, S. S. Lavenberg, and K. S. Trivedi, "The system availability estimator," inProc. 16th Symp. Fault-Tolerant Computing, Vienna, Austria, 1986, pp. 84-89.
[10] A. Goyal and S. S. Lavenberg, "Modeling and analysis of computer system availability,"IBM J. Res. Develop., vol. 31, pp. 651-664, 1987.
[11] A. Goyal, P. Heidelberger, and P. Shahabuddin, "Measure specific dynamic importance sampling for availability simulations," in1987 Winter Simulation Conf. Proc., A. Thesen, H. Grant, and W. D. Kelton, Eds., IEEE Press, 1987, pp. 351-357.
[12] A. Goyal, P. Shahabuddin, P. Heidelberger, V. F. Nicola, and P. Glynn, "A unified framework for simulating-Markovian models of highly dependable systems,"IEEE Trans. Comput., vol. 41, pp. 36-51, 1992.
[13] P. J. Haas and G. S. Shedler, "Regenerative generalized semi-Markov processes,"Commun. Statist. Stochastic Models, vol. 3, no. 3, pp. 409- 438, 1987.
[14] J. C. Laprie, K. Kanoun, C. Beounes, and M. Kaaniche, "The KAT (knowledge-action-transformation) approach to the modeling and evaluation of reliability and availability growth,"IEEE Trans. Software Eng., vol. 17, no. 4, pp. 370-382, 1991.
[15] D. Lee, J. Abraham, D. Rennels, and G. Gilley, "A numerical technique for the hierarchical evaluation of large, closed fault-tolerant systems," inProc. 2nd Int. Conf. Dependable Computing for Critical Applications, Tucson, AZ, 1991, pp. 49-56.
[16] E. E. Lewis and F. Böhm, "Monte Carlo simulation of Markov unreliability models,"Nucl. Eng. Des., vol. 77, pp. 49-62, 1984.
[17] A. P. A. van Moorsel, B. R. Haverkort, and I. G. Niemegeers, "Fault injection simulation: A variance reduction technique for systems with rare events," inProc. 2nd Int. Conf. Dependable Computing for Critical Applications, Tucson, AZ, 1991, pp. 57-64.
[18] R. R. Muntz, E. de Souza e Silva, and A. Goyal, "Bounding availability of repairable computer systems,"IEEE Trans. Comput., vol. 38, pp. 1714-1723, 1989.
[19] M. K. Nakayama, "Asymptotics for likelihood ratio derivative estimators in simulations of highly reliable Markovian systems," IBM, Yorktown Heights, NY, Res. Rep. RC 17357, 1991.
[20] V. F. Nicola, A. Bobbio, and K. S. Trivedi, "A unified performance reliability analysis of a system with a cumulative down time constraint,"Microelectron. Rel., vol. 32, no. 1-2, pp. 49-65, 1992.
[21] V. F. Nicola, M. K. Nakayama, P. Heidelberger, and A. Goyal, "Fast simulation of dependability models with general failure, repair and maintenance processes," inProc. Twentieth Symp. Fault-Tolerant Comput., Newcastle upon Tyne, England, 1990, pp. 491-498.
[22] S. Parekh and J. Walrand, "A quick simulation method for excessive backlogs in networks of queues,"IEEE Trans. Automat. Contr., vol. 34, no. 1, pp. 54-66, 1989.
[23] H. Schwetman, "Using CSIM to model complex systems," in1988 Winter Simulation Conf. Proc., pp. 491-499.
[24] P. Shahabuddin, "Simulation and analysis of highly reliable systems," Ph.D. dissertation, Dep. Oper. Res., Stanford Univ., 1990.
[25] P. Shahabuddin, V. F. Nicola, P. Heidelberger, A. Goyal, and P. W. Glynn, "Variance reduction in mean time to failure simulations," in1988 Winter Simulation Conf. Proc., M. A. Abrams, P. L. Haigh, and J. C. Comfort, Eds., IEEE Press, 1988, pp. 491-499.
[26] J. Stiffler and L. Bryant, "CAREIII phase III report-mathematical description," NASA Contractor Rep. 3566, 1982.

Index Terms:
discrete event simulation; discrete time systems; probability; redundancy; reliability theory; highly dependable systems; failure and repair; dependable systems; event rescheduling; variance reductions; rare event simulations; component redundancy; reliability; discrete-event systems; highly reliable systems; importance sampling.
V.F. Nicola, M.K. Nakayama, P. Heidelberger, A. Goyal, "Fast Simulation of Highly Dependable Systems with General Failure and Repair Processes," IEEE Transactions on Computers, vol. 42, no. 12, pp. 1440-1452, Dec. 1993, doi:10.1109/12.260634
Usage of this product signifies your acceptance of the Terms of Use.