This Article 
 Bibliographic References 
 Add to: 
FERRARI: A Flexible Software-Based Fault and Error Injection System
February 1995 (vol. 44 no. 2)
pp. 248-260

Abstract—A major step toward the development of fault-tolerant computer systems is the validation of the dependability properties of these systems. Fault/error injection has been recognized as a powerful approach to validate the fault tolerance mechanisms of a system and to obtain statistics on parameters such as coverages and latencies. This paper describes the methodology and guidelines for the design of flexible software based fault and error injection and presents a tool, FERRARI, that incorporates the techniques. The techniques used to emulate transient errors and permanent faults in software are described in detail. Experimental results are presented for several error detection techniques, and they demonstrate the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems.

Index Terms—Fault injection, error injection, real time, coverage, latency.

[1] G. Kanawati and J. Abraham,“FERRARI: A tool for the validation of system dependability properties,”inProc. 22nd Int. Symp. Fault Tolerant Comput., Boston, MA, pp. 336–345, July 1992.
[2] A. Avizienis and D. Rennels,“Fault-tolerance experiments with JPL star computer,”inDig. Papers COMCON 72, pp. 321–324, Sept. 1972.
[3] B. Courtois,“Some results about the efficiency of simple mechanisms for the detection of microcomputer malfunctions,”inProc. Int. Conf. Computer Design (ICCD), pp. 561–564, Oct. 1989.
[4] B. Decouty, G. Michel, and C. Wagner,“An evaluation tool of fault detection mechanisms efficiency,”inProc. 10th Int. Symp. Fault-Tolerant Comput.,pp. 225–227, 1980.
[5] G. Choi, R. Iyer, and V. Carreno,“FOCUS: An experimental environment for validation of fault sensitivity analysis,”IEEE Trans. Comput., vol. 41, no. 12, pp. 1515–1526, Dec. 1992.
[6] J. McGough and F. Swern,“Measurement of fault latency in digital avionic mini processor,”NAS CR-3462, Bendix Corp., Oct. 1981.“Part II,”CR3651, Jan. 1983.
[7] J. Lala,“Fault detection isolation and reconfiguration in FTMP: Methods and experimental results,”inAIAA 83, pp. 21.3.1–21.3.9, 1983.
[8] X. Yang, G. York, W. Birmingham, and D. Siewiorek,“Fault recovery of triplicated software on the Intel iAPX 432,”Distributed Comput. Syst., pp. 438–443, May 1985.
[9] M. Schuette, J. Shen, D. Siewiorek, and Y. Zhu,“Experimental evaluation of two concurrent error detection schemes,”inProc. 16th Int. Symp. Fault-Tolerant Comput., pp. 138–143, July 1986.
[10] U. Gunneflo, J. Karlsson, and J. Torin,“Evaluation of error detection schemes using fault injection by heavy-ion radiation,”inProc. 19th Int. Symp. Fault-Tolerant Comput., Chicago, pp. 340–346, June 1989.
[11] E. Czeck and D. Siewiorek,“Effects of transient gate-level faults on program behavior,”inProc. 20th Int. Symp. Fault-Tolerant Comput., pp. 236–243, 1990.
[12] J. Arlat, Y. Crouzet, and J. C. Laprie,“Fault injection for dependability validation of fault-tolerant computing systems,”inProc. 19th Int. Symp. Fault-Tolerant Comput., Chicago, pp. 348–355, June 1989.
[13] J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J.-C. Fabre, J.-C. Laprie, E. Martins, and D. Powell,“Fault injection for dependability validation—A methodology and some applications,”IEEE Trans. Software Eng., vol. 16, pp. 166–182, Feb. 1990.
[14] K. Shin and Y. Lee,“Measurement and application of fault latency,”IEEE Trans. Comput., vol. C-35, no. 4, pp. 370–375, Apr. 1986.
[15] Z. Segallet al..“FIAT–Fault injection based automated testing environment,”inProc. 18th Int. Symp. Fault-Tolerant Comput., pp. 102–107, June 1988.
[16] J. Barton, E. Czeck, Z. Segall, and D. Siewiorek,“Fault injection experiments using FIAT,”IEEE Trans. Comput., vol. 39, no. 4, pp. 575–582, Apr. 1990.
[17] R. Chillarege and N. Bowen,“Understanding large systems failures–A fault injection experiment,”inProc. 19th Int. Symp. Fault-Tolerant Comput., Chicago, pp. 356–363, June 1989.
[18] L. Young and R. Iyer,“A hybrid monitor assisted fault injection environment,”inProc. 3rd IFIP Working Conf. Dependable Computing for Critical Applications, Sicily, Italy, Sept. 1992.
[19] W. Kao, R. Iyer, and D. Tang,“FINE: A fault injection and monitoring environment for tracing the UNIX system behavior under faults,”IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1105–1118, Nov. 1993.
[20] W. Kao and R. Iyer,“DEFINE: A distributed fault injection and monitoring environment,”inProc. IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, June 1994.
[21] D. Tang and R.K. Iyer, "Experimental Analysis of Computer System Dependability," in Fault-Tolerant Computer System Design, D.K. Pradhan, ed., Prentice-Hall Prof. Tech. Ref., Upper Saddle River, N.J., pp. 282-392.
[22] H. Rosenberg and K. Shin,“Software fault injection and its application in distributed environment,”inProc. 23rd Int. Symp. Fault-Tolerant Comput., France, pp. 208–217, June 1993.
[23] J. Geradin,“The DEF injector test instrument, assistance in the design of reliable and safe systems,”Computers in Industry. North Holland, 1989, pp. 311–319.
[24] K. Hua and J. Abraham,“Design of systems with concurrent error detection using software redundancy,”inProc. ACM/IEEE Fall Joint Computer Conf., Dallas, pp. 826–834, Nov. 1986.
[25] R. Chillarege and R.K. Iyer, "Measurement-Based Analysis of Error Latency," IEEE Trans. Computers, vol. 36, no. 5, pp. 529-537, May 1987.
[26] S. Thatte and J. Abraham,“Test generation for microprocessor,”IEEE Trans. Comput., vol. C-29, no. 4, pp. 429–441, 1980.
[27] G. Kanawati, N. Kanawati, and J. Abraham,“EMAX: A high-level error Models Automatic EXtractor,”inProc. AIAA 93, San Diego, CA, pp. 1297–1306, Oct. 19–21, 1993.
[28] E. Czeck and D. Siewiorek,“Observation on the effects of fault manifestation as a function of workload,”IEEE Trans. Comput., vol. 41, no. 5, pp. 559–566, May 1992.
[29] C. Yount,“The automatic generation of instruction-level error manifestations of hardware faults: A new fault injection mode,”Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburg, PA, May 1993.
[30] M. Rimén and J. Ohlsson, "A Study of the Error Behavior of a 32-bit RISC Subjected to Simulated Transient Fault Injection," Proc. Int'l Test Conf., pp. 696-704, Nov. 1992.
[31] G. Ries, G. Choi, and R. Iyer,“Device-level transient fault modeling,”inProc. 24th Int. Symp. Fault-Tolerant Comput., Austin, TX, pp. 86–94, June 1994.
[32] N. Kanawati and G. Kanawati,“A case for extending FERRARI to include hardware-based fault/error injection,”Univ. of Texas, Austin, Tech. Rep. JAA-01-94, 1994.
[33] N. A. Kanawati, G. A. Kanawati, and J. A. Abraham,“A modular robust binary tree,”inProc. 4th IFIP Working Conf. on Dependable Computing for Critical Applications, San Diego, CA, Jan. 1994.
[34] G. Finelli,“Characterization of fault recovery through fault injection on FTMP,”IEEE Trans. Reliability, vol. R-36, pp. 164–170, June 1987.
[35] D. Andrews,“Using executable assertions for testing and fault tolerance,”inProc. 9th Int. Symp. Fault-Tolerant Comput., pp. 102–105, June 1979.
[36] SUN,“SPARC architecture manual,”ver 7, part no. 800-13900-08 revision, Oct. 1987.
[37] P. Banerjeeet al.,“Algorithm-based fault tolerant on a hypercube multiprocessor,”IEEE Trans. Comput., vol. 39, no. 9, pp. 1132–1145, Sept. 1990.
[38] B.P. Miller, L. Fredrikson, and B. So, "An Empirical Study of the Reliability of Unix Utilities," Comm. ACM, Dec. 1990, pp. 32-44.

Ghani A. Kanawati, Nasser A. Kanawati, Jacob A. Abraham, "FERRARI: A Flexible Software-Based Fault and Error Injection System," IEEE Transactions on Computers, vol. 44, no. 2, pp. 248-260, Feb. 1995, doi:10.1109/12.364536
Usage of this product signifies your acceptance of the Terms of Use.