This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Measurement and Analysis of Workload Effects on Fault Latency in Real-Time Systems
February 1990 (vol. 16 no. 2)
pp. 212-216

The authors demonstrate the need to address fault latency in highly reliable real-time control computer systems. It is noted that the effectiveness of all known recovery mechanisms is greatly reduced in the presence of multiple latent faults. The presence of multiple latent faults increases the possibility of multiple errors, which could result in coverage failure. The authors present experimental evidence indicating that the duration of fault latency is dependent on workload. A synthetic work generator is used to vary the workload, and a hardware fault injector is applied to inject transient faults of varying durations. This method makes it possible to derive the distribution of fault latency duration. Experimental results obtained from the fault-tolerant multiprocessor at the NASA Airlab are presented and discussed.

[1] A. Avizienis and J. C. Laprie, "Dependable computing: From concept to design diversity,"Proc. IEEE, vol. 74, pp. 629-638, May 1986.
[2] K. G. Shin and Y. H. Lee, "Error detection process--Model, design, and impact on computer performance,"IEEE Trans. Comput., vol. C-33, pp. 529-540, June 1984.
[3] A. L. Hopkins, T. B. Smith, and J. H. Lala, "FTMP--A highly reliable fault-tolerant multiprocessor for aircraft,"Proc. IEEE, vol. 66, pp. 1221-1240, Oct. 1978.
[4] R. K. Iyer, S. E. Butner, and E. J. McCluskey, "A statistical failure/ load relationship: Results of a multicomputer study,"IEEE Trans. Comput., vol. C-31, pp. 697-706, July 1982.
[5] R. K. Iyer and D. J. Rosetti, "Effect of system workload on operating system reliability: A study on IBM 3081,"IEEE Trans. Software Eng., vol. SE-11, pp. 1438-1448, Dec. 1985.
[6] X. Castillo and D. P. Siewiorek, "Workload, performance, and reliability of digital computing systems, " inProc. 11th Annu. Int. Symp. Fault-Tolerant Computing, 1981, pp. 84-89.
[7] J. G. McGough and F. L. Swern, "Measurement of fault latency in a digital avionic mini processor," Tech. Rep. 3651, NASA Contractor Rep., Jan. 1983.
[8] R. Chillarege and R. K. Iyer, "Fault latency in the memory--An experimental study on VAX 11/780," inProc. 16th Annu. Int. Symp. Fault-Tolerant Computing, 1986, pp. 258-263.
[9] M. H. Woodbury and K. G. Shin, "Workload effects on fault latency for real-time computing systems," inProc. Real-Time Systems Symp., Dec. 1987, pp. 188-197.
[10] T. B. Smith and J. H. Lala, "Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer: Volume I FTMP principles of operation," NASA Contractor Rep., Tech. Rep. 166071, May 1983.
[11] J. H. Lala and T. B. Smith, "Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer: Volume II FTMP software," NASA Contractor Rep., Tech. Rep. 166072, May 1983.
[12] K. Shin and Y.-H. Lee, "Measurement and application of fault latency,"IEEE Trans. Computers, vol. C-35, pp. 370-375, Apr. 1986.
[13] F. Feather, "Validation of a fault-tolerant multiprocessor: Baseline experiments and workload implementation," Master's thesis, Dep. ECE, Carnegie-Mellon Univ., Pittsburgh, PA, 1984.
[14] J. H. Lala and T. B. Smith, "Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer: Volume III FTMP test and evaluation," NASA Contractor Rep., Tech. Rep. 166073, May 1983.
[15] R. E. Barlowet al., Statistical Inference Under Order Restrictions. New York: Wiley, 1972.
[16] D. A. Schoenfeld, "Confidence bounds for normal means under order restrictions, with application to dose-response curves, toxicology experiments, and low-dose extrapolation,"J. Amer. Stat. Assoc., vol. 81, pp. 186-195, Mar. 1986.
[17] E. L. Ellis and R. W. Butler, "Estimating the distribution of fault latency in a digital processor," NASA Tech. Memo., Tech. Rep. 100521, Nov. 1987.

Index Terms:
workload effects; fault latency; real-time systems; control computer systems; recovery mechanisms; multiple latent faults; coverage failure; synthetic work generator; hardware fault injector; fault-tolerant multiprocessor; NASA Airlab; control systems; fault tolerant computing; multiprocessing systems; program testing; real-time systems; software engineering; system recovery.
Citation:
M.H. Woodbury, K.G. Shin, "Measurement and Analysis of Workload Effects on Fault Latency in Real-Time Systems," IEEE Transactions on Software Engineering, vol. 16, no. 2, pp. 212-216, Feb. 1990, doi:10.1109/32.44383
Usage of this product signifies your acceptance of the Terms of Use.