This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems
June 1989 (vol. 38 no. 6)
pp. 775-787
Several different models for predicting coverage in a fault-tolerant system, including models for permanent, intermittent, and transient errors, are discussed. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing coverage are developed. Two types of events that interfere with recovery are examined; and methods for modeling such events, whether they

[1] J. Laprie, "Dependable computing and fault-tolerance: Concepts and terminology," inProc. Fifteenth Int. Symp. Fault-Tolerant Comput., July 1985, pp. 2-7.
[2] W. G. Bouricius, W. C. Carter, and P. R. Schneider, "Reliability modeling techniques for self-repairing computer systems," inProc. 24th ACM Nat. Conf., 1969, pp. 295-309.
[3] T. F. Arnold, "The concept of coverage and its effect on the reliability model of a repairable system,"IEEE Trans. Comput., vol. C-22, pp. 251-254, Mar. 1973.
[4] R. Sahner and K. S. Trivedi, "Reliability modeling using SHARPE,"IEEE Trans. Reliability, vol. R-36, pp. 186-193, June 1987.
[5] J. Bechta Dugan, K. S. Trivedi, M. K. Smotherman, and R. M. Geist, "The hybrid automated reliability predictor,"AIAA J. Guidance, Contr., Dynam., vol. 9, pp. 319-331, May-June 1986.
[6] J. Dugan, A. Bobbio, G. Ciardo, and K. Trivedi, "The design of a unified package for the solution of stochastic petri net models," inProc. Int. Workshop Timed Petri Nets, Torino, Italy, 1985.
[7] K. S. Trivedi, and J. B. Dugan, "Hybrid reliability modeling of fault-tolerant computer systems,"Comput. Elec. Eng., vol. 11, no. 2-3, 1984.
[8] J. McGough, "Effects of near-coincident faults in multiprocessor systems," inProc. 5th IEEE/AIAA Digital Avion. Syst. Conf., Nov. 1983, pp. 16.6.1-16.6.7.
[9] J. J. Stiffler, "Modeling the reliability of extremely reliable systems," May 1986. Invited Talk at Performance '86 and the 1986 ACM SIGMETRICS Conf. on Measurement and Modeling of Comput. Syst., Raleigh, NC.
[10] J. McGough, M. Smotherman, and K. S. Trivedi, "The conservativeness of reliability estimates based on instantaneous coverage,"IEEE Trans. Comput., vol. C-34, no. 7, July 1985.
[11] Y. Malaiya and S. Su, "A survey of methods for intermittent fault analysis," inProc. AFIPS Nat. Comput. Conf., 1979, pp. 577-585.
[12] U. N. Bhat,Elements of Applied Stochastic Processes, 2nd ed. New York: Wiley, 1984.
[13] F. A. Gay, "Evaluation of maintenance software in real-time systems,"IEEE Trans. Comput., vol. C-27, pp. 576-582, June 1978.
[14] T. Anderson and P. A. Lee,Fault Tolerance, Principles, and Practice. Englewood Cliffs, NJ: Prentice-Hall, 1981.
[15] R. B. Conn, P. M. Merryman, and K. L. Whitelaw, "CAST-A complementary analytic-simulative technique for modeling fault-tolerant computing systems," inProc. AIAA Comput. Aerosp. Conf., Los Angeles, CA, Nov. 1977, pp. 6.1-6.27.
[16] K. S. Trivedi,Probability and Statistics with Reliability, Queueing and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[17] A. Bobbio and K. S. Trivedi, "An aggregation technique for the transient analysis of stiff Markov chains,"IEEE Trans. Comput., vol. C-35, pp. 803-814, Sept. 1986.
[18] K. G. Shin and Y. Lee, "Error detection process-Model, design, and its impact on computer performance,"IEEE Trans. Comput., vol. C- 33, pp. 529-540, June 1984.
[19] J. B. Dugan, K. S. Trivedi, R. M. Geist, and V. F. Nicola, "Extended stochastic petri nets: Applications and analysis," inProc. 10th Int. Symp. Comput. Performance (PERFORMANCE 84), Dec. 1984, pp. 507-520.
[20] S. V. Makam and A. Avizienis, "ARIES 81: A reliability and life-cycle evaluation tool for fault-tolerant systems," inProc. Twelfth Int. Symp. Fault-Tolerant Comput., June 1982, pp. 267-274.
[21] Y. Ng and A. Avizienis, "A model for transient and permanent fault recovery in closed fault-tolerant systems," inProc. Sixth Int. Symp. Fault-Tolerant Comput., June 1976, pp. 182-187.
[22] J. J. Stiffler and L. A. Bryant, "CARE III phase III report-Mathematical description," Contr. Rep. 3566, NASA, Nov. 1982.
[23] F. A. Gay, "Reliability of partially self-checking circuits," inProc. Seventh Int. Symp. Fault-Tolerant Comput., 1977, pp. 135-142.
[24] M. K. Molloy, "Performance analysis using stochastic Petri nets,"IEEE Trans. Comput., vol. C-31, pp. 913-917, Sept. 1982.
[25] M. Ajmone Marsan, G. Balbo, and G. Conte, "A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems,"ACM Trans. Comput. Syst., vol. 2, pp. 93-122, May 1984.
[26] M. A. Holliday and M. K. Vernon, "The GTPN analyzer: Numerical methods and user interface," inProc. Fall Joint Comput. Conf., pp. 1099-1105, ACM and Computer Society of the IEEE, Nov. 1986.
[27] J. Bechta Dugan, "Extended stochastic petri nets: Applications and analysis," Ph.D. dissertation, Dep. Elec. Eng., Duke Univ., 1984.
[28] A. L. Hopkins, Jr., T. B. Smith, III, and J. H. Lala, "FTMP-A highly reliable fault-tolerant multiprocessor for aircraft,"Proc. IEEE, vol. 66, pp. 1221-1239, Oct. 1978.
[29] J. J. Stiffler, "Computer aided reliability estimation," inProc. AIAA/NASA/IEEE/ACM Comput. Aerosp. Conf., Nov. 1977.
[30] Y. Lee, "Characterization of failure handling in fault-tolerant multiprocessor systems," Ph.D. dissertation, Univ. Michigan, Nov. 1984.

Index Terms:
dependability analysis; fault-tolerant system; Petri net models; computing coverage; recovery; sensitivity; system reliability; transient recovery; fault tolerant computing; Markov processes; Petri nets; system recovery.
Citation:
J.B. Dugan, K.S. Trivedi, "Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems," IEEE Transactions on Computers, vol. 38, no. 6, pp. 775-787, June 1989, doi:10.1109/12.24286
Usage of this product signifies your acceptance of the Terms of Use.