This Article 
 Bibliographic References 
 Add to: 
Dependability Modeling and Evaluation of Software Fault-Tolerant Systems
April 1990 (vol. 39 no. 4)
pp. 504-513

Dependability modeling and evaluation (encompassing reliability and safety issues) of the two major fault tolerance software approaches-recovery blocks (RBs) and N version programming (NVP)-are presented. The study is based on the detailed analysis of software fault-tolerance architectures able to tolerate a single fault (RB: two alternates and an acceptance test; NVP: three versions and a decider). For each approach a detailed model based on the software production process is established and then simplified by assuming that only a single fault type may manifest during execution of the fault-tolerant software and that no error compensation may take place within the software. The analytical results obtained make it possible to identify the improvement, compared to a non-fault-tolerant software, that could result from the use of RB (the acceptance test has to be more reliable from the alternates) and NVP (related faults among the versions and the decider have to be minimized) and to determine the most critical types of related faults. Nested RBs are studied, showing that the proposed analysis approach can be applied to such realistic software structures and that when an alternate is itself an RB, the results are analogous to the case of the addition of a third alternate. The reliability analysis shows that only a small improvement can be expected.

[1] T. Anderson, P. A. Barrett, D. Halliwell, and M. R. Moulding, "Software-fault tolerance: An evaluation,"IEEE Trans. Software Eng., vol. SE-11, no. 12, pp. 1502-1510, Dec. 1985.
[2] J. Arlat, K. Kanoun, and J. C. Laprie, "Dependability evaluation of software fault-tolerance," inProc. 18th IEEE Int. Symp. Fault Tolerant Computing (FTCS-18), Tokyo, Japan, June 1988, pp. 142-147.
[3] A. Avizienis and J. P. J. Kelly, "Fault-tolerance by design diversity: Concepts and experiments,"IEEE Comput. Mag., pp. 67-80, Aug. 1984.
[4] A. Avizienis, P. Gunninberg, J. P. J. Kelly, R. T. Lyu, L. Strigini, P. J. Traverse, K. S. Tso, and U. Voges, "Software fault-tolerance by design diversity--DEDIX: A tool for experiments," inProc. SAFECOMP'85, Como, Italy, Oct. 1985, pp. 173-178.
[5] A. Avizienis and J. C. Laprie, "Dependable computing: From concepts to design diversity,"Proc. IEEE, vol. 74, no. 5, pp. 629-638, May 1986.
[6] P. G. Bishop and F. D. Pullen, "Error masking: A source of failure dependency in multiversion programs," inProc. 1st Int. Working Conf. Dependable Comput. Critical Appl., Santa Barbara, CA, Aug. 1989, pp. 25-32.
[7] S. D. Cha, "A recovery block model and its analysis," inProc. SAFECOMP'86, Sarlat, France, Oct. 1986, pp. 21-26.
[8] L. Chen and A. Avizienis, "N-Version programming: A fault-tolerance approach to reliability of software operation," inProc. FTCS8, Toulouse, France, June 1978, pp. 3-9.
[9] R. C. Cheung, "A user-oriented software reliability model,"IEEE Trans. Software Eng., vol. SE-6, no. 2, pp. 118-125, Mar. 1985.
[10] A. Csenki, "Recovery block reliability analysis with failure clustering," inProc. 1st Int. Working Conf. Dependable Comput. Critical Appl., Santa Barbara, CA, Aug. 1989, pp. 33-42.
[11] D. E. Eckhardt and L. D. Lee, "A theoretical basis for the analysis of multiversion software subject to coincident errors,"IEEE Trans. Software Eng., vol. SE-11, no. 12, pp. 1511-1517, Dec. 1985.
[12] W. Feller,An Introduction to Probability Theory and its Application, Vol. I. New York: Wiley, 1968.
[13] J. N. Gray, "Why do computers stop and what can be done about it?," inProc. 5th SRDSDS, Los Angeles, CA, Jan. 1986, pp. 3-12.
[14] A. Grnarov, J. Arlat, and A. Avizienis, "On the performance of software fault-tolerance strategies," inProc. FTCS10, Kyoto, Japan, Oct. 1980, pp. 251-253.
[15] H. Hecht, "Fault tolerant software,"IEEE Trans. Reliability, vol. R-28, no. 3, pp. 227-232, Aug. 1979.
[16] J. C. Knight, N. G. Leveson, and L. D. St. Jean, "A large scale experiment in N-version programming," inProc. FTCS15, Ann Arbor, MI, July 1985, pp. 135-139.
[17] J. C. Knight and N. G. Leveson, "An empirical study of failure probabilities in multi-version software," inProc. FTCS16, Vienna, Austria, July 1986, pp. 165-170.
[18] J.-C. Laprie, "Dependability evaluation of software systems in operation,"IEEE Trans. Software Eng., vol. SE-10, no. 6, pp. 701-714, Nov. 1984.
[19] J. C. Laprie, "Dependable computing and fault tolerance: basic concepts and terminology," inProc. 15th Int. IEEE Symp. on Fault Tolerant Computing (FTCS-15)(Ann Arbor, MI), June 1985, pp. 2-11.
[20] J.-C. Laprie, J. Arlat, C. Beounes, K. Kanoun, and C. Hourtolle, "Hardware- and software-fault tolerance: Definition and analysis of architectural solutions," inProc. FTCSI7, Pittsburgh, PA, July 1987, pp. 116-121.
[21] B. Littlewood, "Software reliability model for modular program structure,"IEEE Trans. Reliability, vol. R-28, no. 3, pp. 241-246, Aug. 1985.
[22] B. Littlewood and D. R. Miller, "A conceptual model of multiversion software," inProc. FTCS17, Pittsburgh, PA, July 1987, pp. 150-155.
[23] P.R. Lorczak, A.K. Caglayan, and D.E. Eckhardt, "A Theoretical Investigation of Generalized Voters,"Digest of Papers 19th IEEE Symp. Fault-Tolerant Computing Systems, IEEE Computer Society Press, Los Alamitos, Calif., 1989, pp. 444-451.
[24] M. Mulazzani, "Reliability versus safety," inProc. SAFECOMP'85, Como, Italy, Oct. 1985, pp. 141-1146.
[25] A. Pagès and M. Gondran,Fiabilitédes systèmes. France: Eyrolles, 1980.
[26] B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., vol. SE-1, no. 2, pp. 220-232, June 1975.
[27] F. Saglietti and W. Ehrenberger, "Software diversity--Some considerations about its benefits and its limitations," inProc. SAFECOMP'86, Sarlat, France, Oct. 1986, pp. 27-34.
[28] R. K. Scott, J. W. Gault and D. F. McAllisier, "Fault tolerant software reliability modeling,"IEEE Trans. Software Eng., vol. SE-13, pp. 582-592, May 1987.
[29] K. S. Tso, A. Avizienis, and J. P. J. Kelly, "Error recovery in multiversion software," inProc. SAFECOMP'86, Sarlat, France, Oct. 1986, pp. 35-41.

Index Terms:
dependability modelling; software fault-tolerant systems; reliability; safety issues; recovery blocks; N version programming; fault tolerant computing; software engineering.
J. Arlat, K. Kanoun, J.-C. Laprie, "Dependability Modeling and Evaluation of Software Fault-Tolerant Systems," IEEE Transactions on Computers, vol. 39, no. 4, pp. 504-513, April 1990, doi:10.1109/12.54843
Usage of this product signifies your acceptance of the Terms of Use.