This Article 
 Bibliographic References 
 Add to: 
Fault Injection for Dependability Validation: A Methodology and Some Applications
February 1990 (vol. 16 no. 2)
pp. 166-182

The authors address the problem of validating the dependability of fault-tolerant computing systems, in particular, the validation of the fault-tolerance mechanisms. The proposed approach is based on the use of fault injection at the physical level on a hardware/software prototype of the system considered. The place of this approach in a validation-directed design process and with respect to related work on fault injection is clearly identified. The major requirements and problems related to the development and application of a validation methodology based on fault injection are presented and discussed. Emphasis is put on the definition, analysis, and use of the experimental dependability measures that can be obtained. The proposed methodology has been implemented through the realization of a general pin-level fault injection tool (MESSALINE), and its usefulness is demonstrated by the application of MESSALINE to the experimental validation of two systems: a subsystem of a centralized computerized interlocking system for railway control applications and a distributed system corresponding to the current implementation of the dependable communication system of the ESPRIT Delta-4 Project.

[1] J. C. Laprie, "Dependable computing and fault tolerance: basic concepts and terminology," inProc. 15th Int. IEEE Symp. on Fault Tolerant Computing (FTCS-15)(Ann Arbor, MI), June 1985, pp. 2-11.
[2] J. Arlat, Y. Crouzet, and J.-C. Laprie, "Fault injection for dependability validation of fault-tolerant computing systems, in19th Int. Symp. on Fault-Tolerant Computing, June 1989, pp. 348-355.
[3] W. C. Carter and J. Abraham, "Design and evaluation tools for fault-tolerant systems," inProc. AIAA Computers in Aerospace Conf., 1987, pp. 70-77.
[4] J. R. Armstrong, J. G. Tront, and K. W. Li, "Modeling and simulation of the effects of internal transient upsets on microprocessors,"Trans. Soc. Comput. Simulation,vol. 2, no. 1, pp. 73-93, 1985.
[5] J.-P. Gérardin, "Aid to the design of reliable and safe systems: The tool DEFI,"Electronique Industrielle, no. 116, pp. 58-63, Nov. 1986 (in French).
[6] Z. Segallet al., "Fiat-Fault injection based automated testing environment,"in18th Int. Symp. on Fault-Tolerant Computing, June 1988, pp. 102-107.
[7] G. S. Choi, R. K. Iyer, and V. Carreno, "A fault behavior model for an avionic microprocessor: A case study," inProc. 1st Int. Working Conf. Dependable Computing for Critical Applications, Santa Barbara, CA, Aug. 1989, pp. 71-77.
[8] D. Powell, "A hierarchical approach to distribution computing system dependability evaluation,"J. Syst. Software, vol. 1, no. 2, pp. 183-198, 1986.
[9] M. M. Alidrisi, "A simulation approach for computing system reliability,"Microelectron. Rel., vol. 27, no. 3, pp. 463-467, 1987.
[10] A. T. Acree, "On mutation," Ph.D. dissertation, Ga. Inst. of Technol., Atlanta GA, 1980.
[11] R. A. DeMillo, R. J. Lipton, and F. G. Sayward, "Hints on test data selection: Help for the practicing programmer,"Computer, pp. 34- 41, Apr. 1978.
[12] W. E. Howden, "Weak mutation testing and completeness of test sets,"IEEE Trans. Software Eng., vol. SE-8, pp. 371-379, July 1982.
[13] A. Mahmood, D. M. Andrews, and E. J. McCluskey, "Executable assertions and flight software," inProc. 6th AIAA/IEEE Digital Avionics Systems Conf., Baltimore, MD, Dec. 1984, pp. 346-351.
[14] Raytheon Co., "Brassboard fault-tolerant spaceborne computer (BFTSC)," Final Rep., Dec. 1978.
[15] Y. Crouzet and B. Decouty, "Measurements of fault detection mechanisms efficiency: Results," inProc. FTCS-12, Santa Monica, CA, June 1982, pp. 373-376.
[16] P. Morillon, "Physical fault simulation," inProc. Int. Symp. EUROCON'82, Copenhagen, Denmark, June 1982, pp. 489-492.
[17] J. H. Lala, "Fault detection, isolation and reconfiguration in FTMP: Methods and experimental results," inProc. AIAA/IEEE Digitals Avionics Systems Conf., Nov. 1983, pp. 21.3.1-21.3.9.
[18] A. Damm, "Experimental evaluation of error-detection and self-checking coverage of components of a distributed real-time system," Doctorate thesis, Tech. Univ. Vienna, Oct. 1988.
[19] M. E. Schmid, R. L. Trapp, A. E. Davidoff, and G. Masson, "Upset exposure by means of abstraction verification," inProc. FTCS-12, Santa Monica, CA, June 1982, pp. 237-244.
[20] M. A. Schuette, J. P. Shen, D. P. Siewiorek, and Y. X. Zhu, "Experimental evaluation of two concurrent error detection schemes," inProc. FTCS-16, Vienna, Austria, July 1986, pp. 138-143.
[21] M. L. Côrtes, S. D. Millman, H. A. Goosen, and E. J. McCluskey, "Techniques for injecting non stuck-at faults," Stanford Univ., CRC Tech. Rep. 87-21, Mar. 1987.
[22] U. Gunneflo, J. Karlsson, and J. Torin, "Evaluation of error detection schemes using fault injection by heavy-ion radiation," inProc. 19th Int. Symp. Fault-Tolerant Comput. (FTCS), June 1989, pp. 340-347.
[23] W. G. Bouricius, W. C. Carter, and P. R. Schneider, "Reliability modeling techniques for self-repairing computer systems," inProc. 24th ACM Nat. Conf., 1969, pp. 295-309.
[24] M. K. Joseph and J. Bannister, "Coverage estimation and validation," Aerospace Corp. Rep., Aug. 1988.
[25] A. Avizienis and J.-C. Laprie, "Dependable computing: From concepts to design diversity,"Proc. IEEE, pp. 629-638, May 1986.
[26] W. N. Toy, "Fault-tolerant design of local ESS processors,"Proc. IEEE, vol. 66, pp. 1126-1145, Oct. 1978.
[27] D. C. Bossen and M. Y. Hsiao, "Model for transient and permanent error-detection and fault-isolation coverage,"IBM J. Res. Develop., vol. 26, no. 1, pp. 67-77, Jan. 1982.
[28] J. W. Duran and J. Wiorkowski, "Capture-recapture sampling for estimating software error content,"IEEE Trans. Software Eng., vol. SE-7, no. 1, pp. 147-148, Jan. 1981.
[29] K. Shin and Y.-H. Lee, "Measurement and application of fault latency,"IEEE Trans. Computers, vol. C-35, pp. 370-375, Apr. 1986.
[30] R. Chillarege and R. K. Iyer, "Measurement-based analysis of error latency,"IEEE Trans. Computers, vol. C-36, pp. 529-537, May 1987.
[31] J. McGough, "Effects of near-coincident faults in multiprocessor systems," inProc. AIAA/IEEE Digitals Avionics Systems Conf., Nov. 1983, pp. 16.6.1-16.6.7.
[32] J. F. Lawless,Statistical Models and Methods for Lifetime Data. New York: Wiley, 1982.
[33] J. B. Dugan and K. S. Trivedi, "Coverage modeling for dependability analysis of fault-tolerant systems,"IEEE Trans. Comput., vol. 38, pp. 775-787, June 1989.
[34] J. Arlat and J.-C. Laprie, "On the dependability evaluation of high safety systems," inProc. FTCS-15, Ann Arbor, MI, June 1985, pp. 318-323.
[35] W. C. Carter, "Hardware and software dependability evaluation: System dependability," inProc. IFIP 11th World Computer Congress, San Francisco, CA, Aug. 1989, p. 118.
[36] V. D. Agrawal, "Sampling techniques for determining fault coverage in LSI circuits,"J. Digital Syst., vol. 5, no. 3, pp. 189-202, 1981.
[37] N. R. Mann, R. E. Schaffer, and N. D. Singpurwalla,Methods for Statistical Analysis of Reliability and Life Data. New York: Wiley, 1974.
[38] R. Chillarege and N. S. Bowen, "Understanding large system failures-A fault injection experiment,"in19th Int. Symp. on Fault-Tolerant Computing, June 1989, pp. 356-363.
[39] P. Traverse, "On the dependability validation by mutation--Design of an error injection tool," Doctor-Engineer thesis, INP Toulouse, Dec. 1983 (in French).
[40] M. Sevestre, "Validation of a microprocessor-based railway safety system," inProc. 4th Int. Conf. Rel. and Maint., Perros-Guirec and Trégastel, France, May 1984, pp. 586-589 (in French).
[41] D. Powellet al., "The Delta-4 approach to dependability in open distributed computing systems," inDig. Papers, FTCS-18, Tokyo, Japan, June 1988, pp. 246-251.
[42] J. Arlat, Y. Crouzet, and L. Amat, "Dependability validation by means of fault-injection: Application of MESSALINE for the validation of the processing unit of a computerized interlocking system," LAAS Res. Rep. 87-108, Mar. 1987 (in French).
[43] M. Aguera, J. Arlat, Y. Crouzet, J.-C. Fabre, E. Martins, and D. Powell, "Results of fault-injection into an MCS network attachment controller with limited self-checking," LAAS Res. Rep. 89-071, Mar. 1989.

Index Terms:
dependability validation; fault-tolerant computing systems; fault-tolerance mechanisms; hardware/software prototype; validation-directed design process; validation methodology; general pin-level fault injection tool; MESSALINE; centralized computerized interlocking system; railway control applications; distributed system; dependable communication system; ESPRIT Delta-4 Project; computer communications software; distributed processing; fault tolerant computing; program verification; railways; software tools.
J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J.-C. Fabre, J.-C. Laprie, E. Martins, D. Powell, "Fault Injection for Dependability Validation: A Methodology and Some Applications," IEEE Transactions on Software Engineering, vol. 16, no. 2, pp. 166-182, Feb. 1990, doi:10.1109/32.44380
Usage of this product signifies your acceptance of the Terms of Use.