This Article 
 Bibliographic References 
 Add to: 
Estimating Bounds on the Reliability of Diverse Systems
April 2003 (vol. 29 no. 4)
pp. 345-359

Abstract—We address the difficult problem of estimating the reliability of multiple-version software. The central issue is the degree of statistical dependence between failures of diverse versions. Previously published models of failure dependence described what behavior could be expected “on average” from a pair of “independently generated” versions. We focus instead on predictions using specific information about a given pair of versions. The concept of “variation of difficulty” between situations to which software may be subject is central to the previous models cited, and it turns out to be central for our question as well. We provide new understanding of various alternative imprecise estimates of system reliability and some results of practical use, especially with diverse systems assembled from pre-existing (e.g., “off-the-shelf”) subsystems. System designers, users, and regulators need useful bounds on the probability of system failure. We discuss how to use reliability data about the individual diverse versions to obtain upper bounds and other useful information for decision making. These bounds are greatly affected by how the versions' probabilities of failure vary between subdomains of the demand space or between operating regimes—it is even possible in some cases to demonstrate, before operation, upper bounds that are very close to the true probability of failure of the system—and by the level of detail with which these variations are documented in the data.

[1] M.R. Lyu, “Software Fault Tolerance,” Trends in Software, p. 337, 1995.
[2] MoD “Requirements for Safety Related Software in Defence Equipment,” UK Ministry of Defence, Defence Standard 00-55, Issue 2, Aug. 1997.
[3] P.G. Bishop and F.D. Pullen, “PODS Revisited—A Study of Software Failure Behaviour,” Proc. 18th Int'l Symp. Fault-Tolerant Computing, 1988.
[4] J.C. Knight and N.G. Leveson, "An Experimental Evaluation of the Assumption of Independence in Multiversion Programming," IEEE Trans. Software Eng., Vol. 12, No. 1, 1986, pp. 96-109.
[5] B. Littlewood and L. Strigini,“Validation of ultra-high dependability for software-based systems,” Comm. ACM, vol. 36, no. 11, pp. 69-80, Nov. 1993.
[6] D.E. Eckhardt and L.D. Lee, “A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors,” IEEE Trans. Software Eng., vol. 11, pp. 1511-1517, 1985.
[7] R.P. Hughes, “A New Approach to Common Cause Failure,” Reliability Eng., vol. 17, pp. 211-236, 1987.
[8] B. Littlewood, “The Impact of Diversity Upon Common Mode Failures,” Reliability Eng. and System Safety, vol. 51, pp. 101-113, 1996.
[9] P.G. Bishop, “Software Fault Tolerance by Design Diversity,” Software Fault Tolerance, M. Lyu, ed., pp. 211-229, 1995.
[10] Bev Littlewood and Douglas R. Miller, "Conceptual Modeling of Coincident Failures in Multiversion Software," IEEE Transactions on Software Engineering, vol. 15, p. 1,596, Dec. 1989.
[11] V.F. Nicola and A. Goyal, “Modeling of Correlated Failures and Community Error Recovery in Multiversion Software,” IEEE Trans. Software Eng., vol. 16, pp. 350-359, 1990.
[12] K.B. Djambazov and P. Popov, “The Effects of Testing on the Reliability of Single Version and 1-out-of-2 Software,” Proc. Sixth Int'l Symp. Software Reliability Eng, ISSRE'95, 1995.
[13] A. Gnrarov, J. Arlat, and A. Avizienis, “On the Performance of Software Fault-Tolerance Strategies,” Proc. 10th Int'l Symp. Fault-Tolerant Computing (FTCS-10), 1980.
[14] J. Arlat, K. Kanoun, and J.C. Laprie, “Dependability Modelling and Evaluation of Software Fault-Tolerant Systems,” IEEE Trans. Computers, vol. 39, pp. 504-513, 1990.
[15] A.T. Tai, A. Avizienis, and J.F. Meyer, “Performability Enhancement of Fault-Tolerant Software,” IEEE Trans. Reliability, Special Issue on Fault-Tolerant Sofware, vol. 42, pp. 227-237, 1993.
[16] A. Csenki, “Recovery Block Reliability Analysis with Failure Clustering,” Dependable Computing for Critical Applications (DCCA-1), Dependable Computing and Fault-Tolerant Systems Series, A. Avizienis and J.-C. Laprie, eds., Fourth ed., pp. 75-103, 1991.
[17] J.B. Dugan and M.R. Lyu, System Reliability Analysis of an N-Version Programming Application IEEE Trans. Reliability, pp. 513-519, Dec. 1994.
[18] A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, and L. Strigini, “A Contribution to the Evaluation of the Reliability of Iterative-Execution Software,” Software Testing, Verification, and Reliability, vol. 9, pp. 145-166, 1999.
[19] R. Geist, A.J. Offutt, and F.C. Harris, “Estimation and Enhancement of Real-Time Software Reliability through Mutation Analysis,” IEEE Trans. Computers, vol. 41, pp. 550-558, 1992.
[20] A. Bertolino and L. Strigini, “Assessing the Risk Due to Software Faults: Estimates of Failure Rate vs. Evidence of Perfection,” Software Testing, Verification, and Reliability, vol. 8, pp. 155-166, 1998.
[21] D.E. Eckhardt et al., "An Experimental Evaluation of Software Redundancy as a Strategy for Improving Reliability," IEEE Trans. Software Eng., vol. 17, no. 7, 1991, pp. 692-702.
[22] R.K. Iyer and I. Lee, “Measurement-Based Analysis of Software Reliability,” Handbook of Software Reliability Eng., M. Lyu, ed., pp. 303-358, 1996.
[23] W.M. Miller,L.J. Morell,R.E. Noonan,S.K. Park,D.M. Nicol,B.W. Murrill,, and J.M. Voas,“Estimating the probability of failure when testing reveals nofailures,” IEEE Trans. Software Engineering, vol. 18, no. 1, pp. 33-43, 1992.
[24] B. Littlewood, P. Popov, and L. Strigini, “Assessment of the Reliability of Fault-Tolerant Software: a Bayesian Approach,” Proc. 19th Int'l Conf. Computer Safety, Reliability, and Security, SAFECOMP '2000, 2000.

Index Terms:
Software fault-tolerance, design diversity, demand space partitioning, subdomain testing, common-mode failure.
Peter Popov, Lorenzo Strigini, John May, Silke Kuball, "Estimating Bounds on the Reliability of Diverse Systems," IEEE Transactions on Software Engineering, vol. 29, no. 4, pp. 345-359, April 2003, doi:10.1109/TSE.2003.1191798
Usage of this product signifies your acceptance of the Terms of Use.