This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
February 1998 (vol. 24 no. 2)
pp. 125-136

Abstract—An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This paper presents a new software implemented fault injection and monitoring environment, called Xception, which is targeted for the modern and complex processors. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject quite realistic faults by software, and to monitor the activation of the faults and their impact on the target system behavior in detail. Faults are injected with minimum interference with the target application. The target application is not modified, no software traps are inserted, and it is not necessary to execute the target application in special trace mode (the application is executed at full speed). Xception provides a comprehensive set of fault triggers, including spatial and temporal fault triggers, and triggers related to the manipulation of data in memory. Faults injected by Xception can affect any process running on the target system (including the kernel), and it is possible to inject faults in applications for which the source code is not available. Experimental results are presented to demonstrate the accuracy and potential of Xception in the evaluation of the dependability properties of the complex computer systems available nowadays.

[1] J. Arlat et al., "Fault Injection for Dependability Validation: A Methodology and Some Applications," IEEE Trans. Software Eng., Feb. 1990, pp. 166-182.
[2] J. Christmansson and R. Chillarege, "Generation of an Error Set That Emulates Software Faults Based on Field Data," Proc. FTCS-27,Sendai, Japan, pp. 304-313, 1996.
[3] H. Madeira and J. Silva, “Experimental Evaluation of the Fail-Silent Behavior in Computers without Error Masking,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 350–359, 1994.
[4] J. Karlsson, P. Lidén, P. Dahlgren, R. Johansson, and U. Gunneflo, Using Heavy-Ion Radiation to Validate Fault-Handling Mechanisms IEEE Micro, vol. 14, no. 1, pp. 8-23, Feb. 1994.
[5] G. Miremadi, J. Karlsson, J.U. Gunneflo, and J. Torin, “Two Software Techniques for On-Line Error Detection,” Proc. 22nd Ann. Int'l Symo. Fault-Tolerant Computing, pp. 328-335, July 1992.
[6] H. Madeira, M. Rela, F. Moreira, and J.G. Silva, RIFLE: A General Purpose Pin-Level Fault Injector Proc. First European Dependable Computing Conf. (EDCC-1), pp. 199-216, 1994.
[7] G.S. Choi, R.K. Iyer, and V. Carreno, “FOCUS: An Experimental Environment for Fault Sensitivity Analysis,” IEEE Trans. Computers, vol. 41, no. 12, pp. 1,515-1,526, Dec. 1992.
[8] E. Jenn et al., “Fault Injection into VHDL Models: The MEFISTO tool,” Proc. 24th Fault-Tolerant Computer Systems Symp., pp. 66-75, 1994.
[9] Z. Segall et al., “FIAT—Fault Injection Based Automated Testing Environment,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 102–107, 1988.
[10] R. Chillarege and N.S. Bowen, “Understanding Large System Failures—A Fault Injection Experiment,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 356–363, June 1989.
[11] S. Han, H. Rosenberg, and K. Shin, "DOCTOR: An Integrated Software Fault Injection Environment," Technical Report, Univ. of Michigan, 1993.
[12] G. Kanawati, N. Kanawati, and J. Abraham, “FERRARI: A Tool for the Validation of System Dependability Properties,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 336–344, 1992.
[13] W. Kao, R. Iyer, and D. Tang, "FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults," IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1,105-1,118, Nov. 1993.
[14] W. Kao and R.K. Iyer, "DEFINE: A Distributed Fault Injection and Monitoring Environment," Proc. Workshop Fault-Tolerant Parallel and Distributed Systems, June 1994.
[15] K. Echtle and M. Leu, "The EFA Fault Injector for Fault-Tolerant Distributed System Testing," Proc. Workshop Fault-Tolerant Parallel and Distributed. Systems, pp. 28-35, 1992.
[16] J. Carreira, H. Madeira, and J. Silva, "Assessing the Effect of Communication Faults on Parallel Applications," Proc. IEEE Int'l Computer Performance and Dependability Symp. (IPDS '95), pp. 214-223, Mar. 1995.
[17] N.A. Kanawati, G. Kanawati, and J. Abraham, "Dependability Evaluation Using Hybrid Fault/Error Injection," Proc. IPDS'95, pp. 224-233,Erlangen, Germany, Apr. 1995.
[18] L.T. Young, R.K. Iyer, K.K. Goswami, and C. Alonso, "A Hybrid Monitor Assisted Fault Injection Environment," Proc. Third IFIP Working Conf. Dependable Computing for Critical Applications,Sicily, Italy, Sept. 1992.
[19] C.R. Yount and D.P. Siewiorek, A Methodology for the Rapid Injection of Transient Hardware Errors IEEE Trans. Computers, vol. 45, no. 8, pp. 881-891, Aug. 1996.
[20] J. Güthoff and V. Sieh, Combining Software-Implemented and Simulation-Based Fault Injection into a Single Fault Injection Method Proc. 25th Int'l Symp. Fault-Tolerant Computing (FTCS-25), pp. 196-206, 1995.
[21] R.K. Iyer and D.J. Rossetti, "A Measurement-Based Model for Workload Dependence of CPU Errors," IEEE Trans. Computers, vol. 35, no. 6, pp. 511-519, June 1986.
[22] E. Czeck and D. Siewiorek, "Effects of Transient Gate-Level Faults on Program Behavior," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 236-243, 1990.
[23] F.T. Luk, "Algorithm-Based Fault Tolerance for Parallel Matrix Solvers," Proc. SPIE Real-Time Signal Processing VIII, vol. 564, pp. 49-53, 1985.
[24] G. Kanawati, N. Kanawati, and J. Abraham, "EMAX: An Automatic Extractor of High-Level Error Models," Proc. AIAA Computing Aerospace Conf.,San Diego, Calif., pp. 1,297-1,306, Oct. 1993.
[25] E. Czeck, "Estimates of the Abilities of Software-Implemented Fault Injection to Represent Gate-Level Faults," Proc. Int'l Workshop Fault and Error Injection for Dependability Validation of Computer Systems,Gothemburg, Sweden, IEEE, June 1993.
[26] M. Rimen, J. Ohlsson, and J. Torin, "On Microprocessor Error Behavior Modeling," Proc. 24th Int'l Symp. Fault-Tolerant Computing FTCS-24,Austin, Texas, pp. 76-85, 1994.
[27] "PA-RISC 1.1 Achitecture and Instruction Set Reference Manual," HP Part Number: 09/40-90039, third edition, Feb. 1994.
[28] T. Mathisen, "Pentium Secrets," BYTE, pp. 191-192, July 1994.
[29] "DECchip 21064 and DECchip 21064a Alpha AXP Microprocessors Hardware Reference Manual," order no. EC-Q9ZUA-TE, Digital Equipment Corp., June 1994.
[30] J. Heinrich, "MIPS R4400 Microprocessor User's Manual," Mips Techologies, 1994
[31] E.H. Welbon, C.C. Chan-Nui, D.J. Shippy, and D.A. Hicks, "POWER2 Performance Monitor," IBM J. Research and Development, vol. 38, no. 5, 19xx.
[32] "PowerPC601 RISC Microprocessor User's Manual" Motorola, July 1993
[33] D.P. Siewiorek and R.S. Shwarz, The Theory and Practice of Reliable Design.Bedford, Mass.: Digital Press, Educational Services, DEC, 1982.
[34] P.K. Lala, Fault Tolerant and Fault Testable Hardware Design. New York: Prentice Hall Int'l, 1985.
[35] J. Carreira, "Software Fault Injection in Parallel Systems," Tech report, Univ. of Coimbra, Portugal, July 1995. http:/dsg.dei.uc.pt
[36] D. Powel, E. Martins, J. Arlat, and Y. Crouzet, “Estimators for Fault Tolerance Coverage Evaluation,” IEEE Trans. Computers, vol. 44, no. 2, pp. 261-274, Feb. 1995.
[37] J.G. Silva, J. Carreira, and F. Moreira, "ParLin: From a Centralized Tuple Space to Adaptive Hashing," Transputer Applications and Systems'94, pp. 91-104. IOS Press, 1994.
[38] R. Chowdhury and P. Banerjee, "A Fault-Tolerant Algorithm for Iterative Solution of the Laplace Equation," Proc. Int'l Conf. Parallel Processing, pp. II-133-III-140, 1993.
[39] G. Ries, G. Choi, and R. Iyer, "Device-Level Transient Fault Modeling," Proc. 24th Int'l Symp. Fault-Tolerant Computing, FTCS-24,Austin, Texas, pp. 76-83, 1994.
[40] R. Johansson, "On Single Event Phenomena in Microprocessors," TR162L, Dept. of Computing Eng., Chalmers Univ. of Tech nology, Gotenburg, Sweden, 1993.
[41] M. Hsueh, T. Tsai, and R. Iyer, “Fault Injection Techniques and Tools,” Computer, pp. 75–82, Apr. 1997.
[42] J. Arlat, "Fault Injection for the Experimental Validation of Fault-Tolerant Systems," Proc. Workshop Fault-Tolerant Systems,Kyoto, Japan, IEICE, Tokyo, pp. 33-40, 1992.
[43] J.A. Clark and D.K. Pradhan, "Fault Injection: A Method for Validating Computer-System Dependability," Computer, June 1995, pp. 47-56.
[44] R.K. Iyer, "Experimental Evaluation," Proc. 25th Int'l Symp. Fault-Tolerant Computing, FTCS-25,Pasadena, Calif., special issue, pp. 115-132, 1995.
[45] J. Karlsson, P. Folkesson, J. Arlat, Y. Crouzet, G. Leber, and J. Reisinger, "Application of Three Physical Fault Injection Techniques to the Experimental Assessment of the MARS Architecture," Proc. Fifth IFIP Working Conf. Dependable Computing for Critical Applications, DCCA-5,Urbana-Champaign, Ill., pp. 150-151, 1995.
[46] K.H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," IEEE Trans. Computers, vol. 33, no. 6, pp. 518-528, 1984.
[47] T.K. Tsai and R.K. Iyer, "An Approach to Benchmarking of Fault-Tolerant Commercial Systems," Proc. 26th Ann. Int'l Symp. Fault-Tolerant Computing, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 314-323.
[48] J. Silva, J. Carreira, H. Madeira, D. Costa, and F. Moreira, “Experimental Assessment of Parallel Systems,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 415–424, 1996.
[49] "PARIX 1.3—PowerPC, Software Documentation," Parsytec Gmbh, 1994.

Index Terms:
Fault injection, RISC processors, dependability evaluation, real time.
Citation:
João Carreira, Henrique Madeira, João Gabriel Silva, "Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers," IEEE Transactions on Software Engineering, vol. 24, no. 2, pp. 125-136, Feb. 1998, doi:10.1109/32.666826
Usage of this product signifies your acceptance of the Terms of Use.