This Article 
 Bibliographic References 
 Add to: 
Evaluation of Fault Tolerance Latency from Real-Time Application's Perspectives
January 2000 (vol. 49 no. 1)
pp. 55-64

Abstract—Information on Fault Tolerance Latency (FTL), which is defined as the total time required by all sequential steps taken to recover from an error, is important to the design and evaluation of fault-tolerant computers used in safety-critical real-time control systems with deadline information. In this paper, we evaluate FTL in terms of several random and deterministic variables accounting for fault behaviors and/or the capability and performance of error-handling mechanisms, while considering various fault tolerance mechanisms based on the trade-off between temporal and spatial redundancy, and use the evaluated FTL to check if an error-handling policy can meet the Control System Deadline (CSD) for a given real-time application.

[1] P. Barton, “Fault Latency White Paper,” technical report, Texas Instruments, Microelectronics Dept., Plano, Tex., Jan. 1993.
[2] R.W. Butler and A.L. White, “SURE Reliability Analysis,” NASA technical paper, Mar. 1990.
[3] J. Dugan, K. Trivedi, M. Smotherman, and R. Geist, “The Hybrid Automated Reliability Prediction,” AIAA J. Guidance, Control, and Dynamics, pp. 319-331, May 1986.
[4] G.B. Finelli, “Characterization of Fault Recovery through Fault Onjection on FTMP,” IEEE Trans. Reliability, vol. 36, no. 2, pp. 164-170, June 1987.
[5] R.M. Geist, M. Smotherman, and R. Talley, “Modeling Recovery Time Distributions in Ultrareliable Fault-Tolerant Systems,” Digest of Papers, Fault-Tolerant Computing Symp.-20, pp. 499-504, June 1990.
[6] A.L. Hopkins Jr., T.B. Smith III, and J.H. Lala, “FTMP–A Highly Reliable Fault-Tolerant Multiprocessor for Aircraft,” Proc. IEEE, vol. 66, no. 10, pp. 1,221-1,239, Oct. 1978.
[7] H. Kim and K.G. Shin, "Modeling Externally-Induced Faults in Controller Computers," Proc. 13th IEEE/AIAA Digital Avionics Systems Conf., pp. 402-407,Phoenix, Ariz., Oct. 1994.
[8] H. Kim and K.G. Shin, "On Reconfiguration Latency in Fault-Tolerant Systems," Proc. IEEE 1995 Aerospace Applications Conf., pp. 287-301, Snowmass at Aspen, Colo., Feb. 1995.
[9] C.M. Krishna, K.G. Shin, and R.W. Butler, “Synchronization and Fault-Masking in Redundant Real-Time Systems,” Digest of Papers, Fault-Tolerant Computing Symp.-14, pp. 152-157, June 1984.
[10] J.H. Lala, “Fault Detection, Isolation and Configuration in FTMP: Methods and Experimental Results,” Proc. Fifth IEEE/AIAA Digital Avionics Systems Conf., pp. 21.3.1-21.3.9, 1983.
[11] S.R. McConnel, D.P. Siewiorek, and M.M. Tsao, “The Measurement and Analysis of Transient Errors in Digital Computer Systems,” Digest of Papers, Fault-Tolerant Computing Symp.-9, pp. 67-70, June 1979.
[12] J. McGough, M. Smotherman, and K.S. Trivedi, “The Conservativeness of Reliability Estimates Based on Instantaneous Coverage,” IEEE Trans. Computers, vol. 34, no. 7, pp. 602-608, July 1985.
[13] C. Roark, D. Paul, D. Struble, D. Kohalmi, and J. Newport, “Pooled Spares and Dynamic Reconfiguration,” Proc. NAECON '93, pp. 173-179, May 1993.
[14] K.G. Shin and H. Kim, “Derivation and Application of Hard Deadlines for Real-Time Control Systems,” IEEE Trans. Systems, Man, and Cybernetics, vol. 22, no. 6, pp. 1,403–1,413, Nov. 1992.
[15] K.G. Shin, C.M. Krishna, and Y.H. Lee, "A Unified Method for Evaluating Real-Time Computer Controllers Its Application," IEEE Trans. Automatic Control, vol. 30, pp. 357-366, Apr. 1985.
[16] K.G. Shin and Y.-H. Lee, “Error Detection Process—Model, Design, and Its Impact on Computer Performance,” IEEE Trans. Computers, vol. 33, no. 6, pp. 529-539, June 1984.

Index Terms:
Fault tolerance latency (FTL), temporal/spatial and static/dynamic redundancy, error-handling, Control System Deadline (CSD), dynamic failure.
Hagbae Kim, Kang G. Shin, "Evaluation of Fault Tolerance Latency from Real-Time Application's Perspectives," IEEE Transactions on Computers, vol. 49, no. 1, pp. 55-64, Jan. 2000, doi:10.1109/12.822564
Usage of this product signifies your acceptance of the Terms of Use.