Issue No. 04 - July/August (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TDSC.2010.26
Ilia Polian , Albert-Ludwigs-University of Freiburg, Freiburg
John P. Hayes , University of Michigan, Ann Arbor
Sudhakar M. Reddy , University of Iowa, Iowa City
Bernd Becker , Albert-Ludwigs-University Albert-Ludwigs-University, Freibutg
Transient or soft errors caused by various environmental effects are a growing concern in micro and nanoelectronics. We present a general framework for modeling and mitigating the logical effects of such errors in digital circuits. We observe that some errors have time-bounded effects; the system's output is corrupted for a few clock cycles, after which it recovers automatically. Since such erroneous behavior can be tolerated by some applications, i.e., it is noncritical at the system level, we define the critical soft error rate (CSER) as a more realistic alternative to the conventional SER measure. A simplified technology-independent fault model, the single transient fault (STF), is proposed for efficiently estimating the error probabilities associated with individual nodes in both combinational and sequential logic. STFs can be used to compute various other useful metrics for the faults and errors of interest, and the required computations can leverage the large body of existing methods and tools designed for (permanent) stuck-at faults. As an application of the proposed methodology, we introduce a systematic strategy for hardening logic circuits against transient faults. The goal is to achieve a desired level of CSER at minimum cost by selecting a subset of nodes for hardening against STFs. Exact and approximate algorithms to solve the node selection problem are presented. The effectiveness of this approach is demonstrated by experiments with the ISCAS-85 and -89 benchmark suites, as well as some large (multimillion-gate) industrial circuits.
Soft errors, error tolerance, selective hardening, transient faults.
B. Becker, J. P. Hayes, S. M. Reddy and I. Polian, "Modeling and Mitigating Transient Errors in Logic Circuits," in IEEE Transactions on Dependable and Secure Computing, vol. 8, no. , pp. 537-547, 2010.