This Article 
 Bibliographic References 
 Add to: 
Modeling of Correlated Failures and Community Error Recovery in Multiversion Software
March 1990 (vol. 16 no. 3)
pp. 350-359

Three aspects of the modeling of multiversion software are considered. First, the beta-binomial distribution is proposed for modeling correlated failures in multiversion software. Second, a combinatorial model for predicting the reliability of a multiversion software configuration is presented. This model can take as inputs failure distributions either from measurements or from a selected distribution (e.g. beta-binomial). Various recovery methods can be incorporated in this model. Third, the effectiveness of the community error recovery method based on checkpointing is investigated. This method appears to be effective only when the failure behaviors of program versions are lightly correlated. Two different types of checkpoint failure are also considered: an omission failure where the correct output is recognized at a checkpoint but the checkpoint fails to correct the wrong outputs and a destructive failure where the good versions get corrupted at a checkpoint.

[1] A. Avizienis, "TheN-version approach to fault-tolerant software,"IEEE Trans. Software Eng., vol. SE-11, no. 12, pp. 1491-1501, Dec. 1985.
[2] A. Avizienis, P. Gunningberg, J. P. J. Kelly, L. Strigini, P. J. Traverse, K. S. Tso, and U. Voges, "The UCLA DED1X system: A distributed testbed for multiple-version software," inProc. FTCS-15, Ann Arbor, MI, June 1985, pp. 126-134.
[3] D. E. Eckhardt, Jr. and L. D. Lee, "A theoretical basis for the analysis of multiversion software subject to coincident errors,"IEEE Trans. Software Eng., vol. SE-11, no. 12, pp. 1511-1517, Dec. 1985.
[4] D. A. Griffiths, "Maximum likelihood estimator for the Beta-Binomial distribution and an application to the household distribution of the total number of cases of a disease,"Biometrics, vol. 29, pp. 637- 648, Dec. 1973.
[5] R. V. Hogg and A. T. Craig,Introduction to Mathematical Statistics, 4th ed. New York Macmillan, 1987.
[6] J. Kelly, D. Eckhardt, M. Vouk, D. McAllister, and A. Caglayan, "A large scale second generation experiment in multi-version software: Description and early results," inDig. Papers, FTC-18, Tokyo, Japan, 1988, pp. 9-14.
[7] J. C. Knight, N. G. Leveson, and L. D. St, Jean, "A large scale experiment inN-version programming," inProc. FTCS-15, Ann Arbor, MI, June 1985, pp. 135-139.
[8] J. C. Knight and N. G. Leveson, "An empirical study of failure probabilities in multi-version software," inProc. FTCS-16, Vienna, Austria, July 1986, pp. 165-170.
[9] J. Knight and N. Leveson, "An experimental evaluation of the assumption of independence in multiversion programming,"IEEE Trans. Software Eng., vol. SE-12, no. 1, pp. 96-109, Jan. 1986.
[10] J. C. Laprie, A. Arlat, C. Beounes, K. Kanoun, and C. Hourtolle, "Hardware- and software-fault tolerance: Definition and analysis of architectural solutions," inProc. FTCS-17, Pittsburgh, PA, July 1987, pp. 116-121.
[11] B. Littlewood and D. R. Miller, "A conceptual model of multi-version software," inProc. FTCS-17. Pittsburgh, PA, July 1987, pp. 150-155.
[12] B. Randell, "'System structure for software fault-tolerance,"IEEE Trans. Software Eng., vol. SE-1, no. 2, pp. 220-232, June 1975.
[13] K. S. Tso and A. Avizienis, "Community error recovery inN-version software: A design study with experimentation," inProc. FTCS-17, Pittsburgh, PA, July 1987, pp. 127-133.

Index Terms:
correlated failures; community error recovery; multiversion software; beta-binomial distribution; combinatorial model; software configuration; failure distributions; selected distribution; recovery methods; checkpointing; failure behaviors; lightly correlated; checkpoint failure; omission failure; destructive failure; fault tolerant computing; software reliability; system recovery.
V.F. Nicola, A. Goyal, "Modeling of Correlated Failures and Community Error Recovery in Multiversion Software," IEEE Transactions on Software Engineering, vol. 16, no. 3, pp. 350-359, March 1990, doi:10.1109/32.48942
Usage of this product signifies your acceptance of the Terms of Use.