This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A New Hybrid Fault Detection Technique for Systems-on-a-Chip
February 2006 (vol. 55 no. 2)
pp. 185-198
Hardening SoCs against transient faults requires new techniques able to combine high fault detection capabilities with the usual requirements of SoC design flow, e.g., reduced design-time, low area overhead, and reduced (or null) accessibility to source core descriptions. This paper proposes a new hybrid approach which combines hardening software transformations with the introduction of an Infrastructure IP with reduced memory and performance overheads. The proposed approach targets faults affecting the memory elements storing both the code and the data, independently of their location (inside or outside the processor). Extensive experimental results, including comparisons with previous approaches, are reported, which allow practically evaluating the characteristics of the method in terms of fault detection capabilities and area, memory, and performance overheads.

[1] Y. Zorian, “What Is an Infrastructure IP?” IEEE Design and Test of Computers, vol. 19, no. 3, pp. 5-7, May/June 2002.
[2] P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda, and M. Violante, “Experimentally Evaluating an Automatic Approach for Generating Safety-Critical Software with Respect to Transient Errors,” IEEE Trans. Nuclear Science, vol. 47, no. 6, pp. 2231-2236, Dec. 2000.
[3] Z. Alkhalifa, V.S.S. Nair, N. Krishnamurthy, and J.A. Abraham, “Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 6, pp. 627-641, June 1999.
[4] N. Oh, P.P. Shirvani, and E.J. McCluskey, “Control-Flow Checking by Software Signatures,” IEEE Trans. Reliability, vol. 51, no. 2, pp. 111-122, Mar. 2002.
[5] E. Dupont, M. Nikolaidis, and P. Rohr, “Embedded Robustness IPs for Transient-Error-Free ICs,” IEEE Design and Test of Computers, vol. 19, no. 3, pp. 56-70, May/June 2002.
[6] P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, and M. Violante, “An FPGA-Based Approach for Speeding-Up Fault Injection Campaigns on Safety-Critical Circuits,” J. Electronic Testing: Theory and Applications, vol. 18, no. 3, pp. 261-271, June 2002.
[7] http:/www.eembc.org, 2004.
[8] O. Goloubeva, M. Rebaudengo, M. Sonza Reorda, and M. Violante, “Soft-Error Detection Using Control Flow Assertions,” Proc. IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, pp. 581-588, 2003.
[9] B. Randell, “System Structure for Software Fault Tolerant,” IEEE Trans. Software Eng., vol. 1, no. 2, pp. 220-232, June 1975.
[10] A. Avizienis, “The N-Version Approach to Fault-Tolerant Software,” IEEE Trans. Software Eng., vol. 11, no. 12, pp. 1491-1501, Dec. 1985.
[11] K.H. Huang and J.A. Abraham, “Algorithm-Based Fault Tolerance for Matrix Operations,” IEEE Trans. Computers, vol. 33, no. 12, pp. 518-528, Dec. 1984.
[12] Z. Alkhalifa, V.S.S. Nair, N. Krishnamurthy, and J.A. Abraham, “Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 6, pp. 627-641, June 1999.
[13] A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques and Tools. Harlow, U.K.: Addison-Wesley, 1986.
[14] S.S. Yau and F.-C. Chen, “An Approach to Concurrent Control Flow Checking,” IEEE Trans. Software Eng., vol. 6, no. 2, pp. 126-137, Mar. 1980.
[15] N. Oh, S. Mitra, and E.J. McCluskey, “ED4I: Error Detection by Diverse Data and Duplicated Instructions,” IEEE Trans. Computers, vol. 51, no. 2, pp. 180-199, Feb. 2002.
[16] A. Mahmood and E.J. McCluskey, “Concurrent Error Detection Using Watchdog Processors— A Survey,” IEEE Trans. Computers, vol. 37, no. 2, pp. 160-174, Feb. 1988.
[17] M. Namjaoo and E.J. McCluskey, “Watchdog Processors and Capability Checking,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 245-248, 1982.
[18] A. Mahmood, D.J. Lu, and E.J. McCluskey, “Concurrent Fault Detection Using a Watchdog Processor and Assertions,” Proc. IEEE Int'l Test Conf., pp. 622-628, 1983.
[19] A.A. Al-Yamani, N. Oh, and E.J. McCluskey, “Performance Evaluation of Checksum-Based ABFT,” IEEE Defect and Fault Tolerance in VLSI Systems, pp. 461-466, 2001.
[20] M.A. Schuette and J.P. Shen, “Processor Control Flow Monitoring Using Signatured Instruction Streams,” IEEE Trans. Computers, vol. 36, no. 3, pp. 264-276, Mar. 1987.
[21] M. Namjoo, “CERBERUS-16: An Architecture for a General Purpose Watchdog Processor,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 216-219, 1983.
[22] K. Wilken and J.P. Shen, “Continuous Signature Monitoring: Low-Cost Concurrent Detection of Processor Control Errors,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 6, pp. 629-641, June 1990.
[23] J. Ohlsson and M. Rimen, “Implicit Signature Checking,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 218-227, 1995.
[24] L. Bolzani, M. Rebaudengo, M. Sonza Reorda, F. Vargas, and M. Violante, “Hybrid Soft Error Detection by Means of Infrastructure IP Cores,” Proc. IEEE Int'l On-Line Testing Symp., pp. 79-84, 2004.
[25] P. Bernardi, L. Bolzani, M. Rebaudengo, M. Sonza Reorda, F. Vargas, and M. Violante, “Hybrid Soft Error Detection by Means of Infrastructure IP Cores,” IEEE Proc. Int'l Dependable Computing and Comm. Symp., pp. 50-58, 2005.
[26] M. Nicolaidis, “Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies,” Proc. IEEE VLSI Test Symp., pp. 86-94, 1999.
[27] R. Velazco, S. Rezgui, and R. Ecoffet, “Predicting Error Rate for Microprocessor-Based Digital Architectures through C. E. U. (Code Emulating Upsets) Injection,” IEEE Trans. Nuclear Science, vol. 47, no. 6, pp. 2405-2411, Dec. 2000.
[28] F.L. Vargas and M. Nicolaidis, “SEU-Tolerant SRAM Design Based on Current Monitoring,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 106-115, 1994.
[29] J. Browning, R. Koga, and W.A. Kolasisnki, “Single Event Upset Rate Estimates for a 16-K CMOS RAM,” IEEE Trans. Nuclear Science, vol. 32, no. 6, pp. 4133-4139, Dec. 1985.

Index Terms:
Index Terms- SoC dependability, infrastructure IP, transient fault detection.
Citation:
Paolo Bernardi, Leticia Maria Veiras Bolzani, Maurizio Rebaudengo, Matteo Sonza Reorda, Fabian Luis Vargas, Massimo Violante, "A New Hybrid Fault Detection Technique for Systems-on-a-Chip," IEEE Transactions on Computers, vol. 55, no. 2, pp. 185-198, Feb. 2006, doi:10.1109/TC.2006.15
Usage of this product signifies your acceptance of the Terms of Use.