This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
August 1993 (vol. 42 no. 8)
pp. 913-923

The authors describe a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection test sequence aimed at evaluating the coverage of the fault tolerance process are presented. Emphasis is given to the derivation of experimental measures. The various steps by which the fault occurrence and fault tolerance processes are combined to evaluate dependability measures are identified and their interactions are analyzed. The method is illustrated by an application to the dependability evaluation of the distributed fault-tolerant architecture of the Esprit Delta-4 Project.

[1] W. C. Carter and J. A. Abraham, "Design and evaluation tools for fault-tolerant systems," inProc. AIAA Computers in Aerospace Conf., 1987, pp. 70-77.
[2] W. G. Bouricius, W. C. Carter, and P. R. Schneider, "Reliability modeling techniques for self-repairing computer systems," inProc. 24th ACM Nat. Conf., 1969, pp. 295-309.
[3] J. B. Dugan and K. S. Trivedi, "Coverage modeling for dependability analysis of fault-tolerant systems,"IEEE Trans. Comput., vol. 38, no. 6, pp. 775-787, June 1989.
[4] D. Powell, "Failure Mode Assumptions and Assumption Coverage,"Proc. 22nd Int'l Symp. Fault-Tolerant Computing, CS Press, 1992, pp. 386-395.
[5] Y. Crouzet and B. Decouty, "Measurements of fault detection mechanisms efficiency: Results," inProc. 12th Int. Symp. Fault-Tolerant Computing (FTCS-12), IEEE, Santa Monica, CA, June 1982, pp. 373-376.
[6] J. H. Lala, "Fault detection, isolation and reconfiguration in FTMP: Methods and experimental results," inProc. Digital Avionics Systems Conf., AIAA/IEEE, Nov. 1983, pp. 21.3.1-21.3.9.
[7] Z. Segallet al., "Fiat-Fault injection based automated testing environment,"in18th Int. Symp. on Fault-Tolerant Computing, June 1988, pp. 102-107.
[8] A. Damm, "Experimental evaluation of error-detection and self-checking coverage of components of a distributed real-time system," Doctoral dissertation, Technical Univ., Vienna, Austria, Oct. 1988.
[9] R. Chillarege and N. S. Bowen, "Understanding large system failures-A fault injection experiment,"in19th Int. Symp. on Fault-Tolerant Computing, June 1989, pp. 356-363.
[10] U. Gunneflo, J. Karlsson, and J. Torin, "Evaluation of error detection schemes using fault injection by heavy-ion radiation," inProc. 19th Int. Symp. Fault-Tolerant Comput. (FTCS), June 1989, pp. 340-347.
[11] E. W. Czeck and D. P. Siewiorek, "Effect of transient gate-level faults on program behavior," inProc. 20th Int. Symp. Fault Tolerant Computing (FTCS-20), IEEE, Newcastle upon Tyne, UK, June 1990, pp. 236-243.
[12] G. S. Choi, R. K. Iyer, R. Saleh, and V. Carreno, "A fault behavior model for an avionic microprocessor: A case study," inProc. 1st Int. Working Conf. Dependable Computing for Critical Applications, Santa Barbara, CA, Aug. 1989, Avizienis and J.-C. Laprie, Eds. Vienna, Austria: Springer-Verlag, 1991, pp. 177-195.
[13] G. A. Kanawati, N. A. Kanawati, and J. A. Abraham, "Ferrari-A tool for the validation of system dependability,"in22nd Int. Symp. on Fault-Tolerant Computing, July 1992, pp. 336-344.
[14] J. Arlat, Y. Crouzet, and J. -C. Laprie, "Fault injection for the experimental validation of fault tolerance," inProc. Esprit Conf.'91 (CEC-CGXIII), Brussels, Belgium, Nov. 1991, pp. 791-805.
[15] J. C. Laprie, "Dependable computing and fault tolerance: basic concepts and terminology," inProc. 15th Int. IEEE Symp. on Fault Tolerant Computing (FTCS-15)(Ann Arbor, MI), June 1985, pp. 2-11.
[16] J. -C. Laprie, Ed., "Dependability: Basic concepts and terminology," inDependable Computing and Fault-Tolerance, Vienna, Austria: Springer-Verlag, 1992, vol. 5.
[17] K. Echtle and Y. Chen, "Evaluation of deterministic fault injection for fault-tolerant protocol testing," inProc. 21st Int. Symp. Fault-Tolerant Computing (FTCS-21), IEEE, Montréal, Québec, Canada, June 1991, pp. 418-425.
[18] D. Avresky, J. Arlat, J.-C. Laprie, and Y. Crouzet, "Fault injection for the formal testing of fault tolerance," inProc. 22nd Int. Symp. Fault-Tolerant Computing (FTCS-22), IEEE, Boston, MA, July 1992, pp. 345-354.
[19] J. McGough, F. Swern, and S. J. Bavuso, "Methodology for measurement of fault latency in a digital avionic miniprocessor," inProc. AGARD Conf. 303, Tactical Airborne Distributed Computing and Networks, Røros, Norway, June 1981, pp. 21.1-21.18.
[20] D. P. Siewiorek and R. S. Swarz,The Theory and Practice of Reliable System Design. Digital Press, 1982.
[21] C. J. Walter, "Evaluation and design of an ultra-reliable distributed architecture for fault tolerance,"IEEE Trans. Rel., vol. 39, no. 4, pp. 492-499, Oct. 1990.
[22] D. Powell, ed., Delta 4:A Generic Architecture for Dependable Distributed Computing, Springer-Verlag, Berlin, 1991.
[23] D. Tang and R. K. Iyer, "Impact of correlated failures on dependability in a VAXcluster system," inProc. 2nd Int. Working Conf. Dependable Computing for Critical Applications, Tucson, AZ, Feb. 1991, J. F. Meyer and R. D. Schlichting, Eds. Vienna, Austria: Springer-Verlag, 1992, pp. 175-194.
[24] J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J.-C. Fabre, J.-C. Laprie, E. Martins, and D. Powell, "Fault injection for dependability validation--A methodology and some applications,"IEEE Trans. Software Eng., vol. 16, no. 2, pp. 166-182, Feb. 1990.
[25] J. Arlat, "Dependability validation by fault injection: Method, implementation, application," State Doctoral dissertation, National Polytechnic Inst. Toulouse, France, Dec. 1990 (in French).
[26] R. Geist, M. Smotherman, and R. Talley, "Modeling recovery time distributions in ultrareliable fault-tolerant systems," inProc. 20th Int. Symp. Fault-Tolerant Computing (FTCS-20), IEEE, Newcastle upon Tyne, UK, June 1990, pp. 499-504.
[27] W. Nelson,Applied Life Data Analysis. New York: Wiley, 1982.
[28] J. F. Lawless,Statistical Models and Methods for Lifetime Data. New York: Wiley, 1982.
[29] V. F. Nicola, M. K. Nakayama, P. Heidelberger, and A. Goyal, "Fast simulation of dependability models with general failure, repair and maintenance processes," inProc. Twentieth Symp. Fault-Tolerant Comput., Newcastle upon Tyne, England, 1990, pp. 491-498.
[30] D. A. Rennels, "Some past experiments and future plans in experimental evaluations of fault tolerance," inProc. Int. Symp. Mini and Micro-Computers in Control and Measurement, San Francisco, CA, 1981, pp. 91-98.
[31] E. Martins, "Validation of distributed systems by fault injection," Doctoral dissertation, ENSAE, Toulouse, France, June 1992 (in French).
[32] J. McGough, "Effects of near-coincident faults in multiprocessor systems," inProc. 5th Digital Avionics Systems Conf., AIAA/IEEE, Nov. 1983, pp. 16.6.1-16.6.7.
[33] J. Arlat and J. -C. Laprie, "On the dependability evaluation of high safety systems," inProc. 15th Int. Symp. Fault-Tolerant Computing (FTCS-15), IEEE, Ann Arbor, MI, 1985, pp. 318-323.
[34] A. Pagès and M. Gondran,System Reliability. Paris: Eyrolles, 1980 (in French).
[35] D. Powellet al., "The Delta-4 approach to dependability in open distributed computing systems," inDig. Papers, FTCS-18, Tokyo, Japan, June 1988, pp. 246-251.
[36] J. Arlat et al., "Experimental Evaluation of the Fault Tolerance of an Atomic Multicast Protocol,"IEEE Trans. Reliability, Vol. 39, No. 4, IEEE Computer Society Press, Los Alamitos, Calif., Oct. 1990, pp. 455-467.
[37] J. Arlat, Y. Crouzet, E. Martins, and D. Powell, "Dependability testing report LA2--Fault-injection on the fail-silent NAC: Preliminary results," LAAS-CNRS, Rep. 91043, Mar. 1991.
[38] J. Arlat, Y. Crouzet, E. Martins, and D. Powell, "Dependability testing report LA3--Fault-injection on the extended self-checking NAC," LAAS-CNRS, Rep. 91396, Dec. 1991.

Index Terms:
fault injection; dependability evaluation; fault-tolerant systems; fault tolerance process; fault occurrence process; test sequence; dependability measures; distributed fault-tolerant architecture; Esprit Delta-4 Project; distributed processing; fault tolerant computing.
Citation:
J. Arlat, A. Costes, Y. Crouzet, J.C. Laprie, D. Powell, "Fault Injection and Dependability Evaluation of Fault-Tolerant Systems," IEEE Transactions on Computers, vol. 42, no. 8, pp. 913-923, Aug. 1993, doi:10.1109/12.238482
Usage of this product signifies your acceptance of the Terms of Use.