This Article 
 Bibliographic References 
 Add to: 
Performance Analysis of Real-Time Software Supporting Fault-Tolerant Operation
July 1990 (vol. 39 no. 7)
pp. 906-918

Analyzing the performance of real-time control systems featuring mechanisms for online recovery from software faults is discussed. The application is assumed to consist of a number of interacting cyclic processes. The underlying hardware is assumed to be a multiprocessor, possibly with a separate control processor. The software structure is assumed to use design diversity along with forward and/or backward recovery. A detailed but efficiently solvable model for predicting various performance and reliability characteristics is developed. One of the key ideas used in modeling is hierarchical decomposition, which enables computation of level-oriented performance parameters in an efficient manner. The model is general, and adaptable for a number of useful special cases.

[1] P. Ammann and J. C. Knight, "Data diversity: An approach to software fault tolerance," inProc. 17th FTCS, July 1987, no. 122-126.
[2] T. Andersonet al., "Multilevel recovery in distributed systems," inProc. 13th Fault Tolerant Comput. Symp., 1983, pp. 140-147.
[3] A. Avizienis, "TheN-version approach to fault-tolerant software,"IEEE Trans. Software Eng., vol. SE-11, no. 12, pp. 1491-1501, Dec. 1985.
[4] P. G. Bishop and F. D. Pullen, "PODS revisited--A study of software failure behavior," inProc. 18th Symp. Fault-Tolerant Comput., June 1988, pp. 2-8.
[5] A. K. Caglayan, P. R. Lorczak, and S. E. Eckhardt, "An experimental investigation of software diversity in a fault-tolerant avionics application," inProc. 7th Symp. Reliable Distributed Syst., Oct. 1988, pp. 63-69.
[6] P. J. Courtois,Decomposability: Queueing and Computer System Applications. New York: Academic, 1977.
[7] D. E. Eckhardt and L. D. Lee, "A theoretical basis for the analysis of multiversion software subject to coincident errors,"IEEE Trans. Software Eng., vol. SE-11, no. 12, Dec. 1985.
[8] H. Hoffes, "Moment formulae for a class of mixed multi-job type queueing networks,"Bell. Syst. Tech. J., vol. 61, no. 5, pp. 606-624, May 1985.
[9] K. Kant, "Software fault-tolerance in real-time computing environments,"Inform. Sci., vol. 42, no. 3, pp. 255-282, Aug. 1987.
[10] K. Kant,An Introduction of Computer System Performance Evaluation. New York: McGraw-Hill, 1990, to be published.
[11] K. Kant, "Designing highly reliable systems using robust data structures," Tech. Rep., Pennsylvania State Univ., PA, May 1989.
[12] K. H. Kim and H. O. Welch, "Distributed execution of recovery blocks: An approach for uniform treatment of hardware and software faults in real-time applications,"IEEE Trans. Comput., vol. 38, no. 5, pp. 626-636, May 1989.
[13] J. C. Knight and N. G. Leveson, "An empirical study of failure probabilities in multiversion software," inProc. 16th FTCS, July 1986, pp. 165-170.
[14] J. Knight and N. Leveson, "An experimental evaluation of the assumption of independence in multiversion programming,"IEEE Trans. Software Eng., vol. SE-12, no. 1, pp. 96-109, Jan. 1986.
[15] B. Littlewood and D. R. Miller, "A conceptual model of multiversion software," inProc. 17th FTCS, July 1987, pp. 150-155.
[16] V. F. Nicola and A. Goyal, "Modeling correlated failures and community error recovery in multiversion software," IBM Res. Rep., Dec. 1987.
[17] B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., vol. SE-1, no. 3, pp. 220-232, June 1975.
[18] R. D. Schlichting, "A technique for estimating performance of fault-tolerant programs,"IEEE Trans. Software Eng., vol. SE-11, no. 6, pp. 555-563, June 1985.
[19] R. K. Scottet al., "Experimental validation of six fault-tolerant reliability models," inProc. 14th Fault Tolerant Comput. Symp., 1984, pp. 102-107.
[20] R. K. Scott, J. W. Gault, and D. F. McAllister, "The consensus recovery block," inProc. Total System Reliability Symp., 1983, pp. 74-85.
[21] R. K. Scott, J. W. Gault and D. F. McAllisier, "Fault tolerant software reliability modeling,"IEEE Trans. Software Eng., vol. SE-13, pp. 582-592, May 1987.
[22] D. P. Siewiorek and R. S. Swarz,The Theory and Practice of Reliable System Design. Bedford, MA: Digital, 1982.
[23] D. J. Taylor, "Concurrency and forward recovery in atomic actions,"IEEE Trans. Software Eng., vol. SE-12, no. 1, pp. 69-78, Jan. 1986.
[24] K. S. Tso and A. Avizienis, "Community error recovery in N-version software: A design study with experimentation," inProc. FTCS 17, July 1987, pp. 127-133.
[25] H. O. Welch, "Distributed recovery block performance in a real-time control loop," inProc. IEEE Real-Time Syst. Symp., 1983, pp. 268-276.
[26] Architecture Reference Manual, Stratus Comput., Inc., 1988.

Index Terms:
performance analysis; real-time software supporting fault-tolerant operation; online recovery; interacting cyclic processes; multiprocessor; performance; reliability characteristics; modeling; hierarchical decomposition; level-oriented performance parameters; fault tolerant computing; multiprocessing systems; performance evaluation; real-time systems.
K. Kant, "Performance Analysis of Real-Time Software Supporting Fault-Tolerant Operation," IEEE Transactions on Computers, vol. 39, no. 7, pp. 906-918, July 1990, doi:10.1109/12.55692
Usage of this product signifies your acceptance of the Terms of Use.