This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Self-Stabilizing Microprocessor: Analyzing and Overcoming Soft Errors
April 2006 (vol. 55 no. 4)
pp. 385-399
Web Extra: View supplemental material
Soft errors are changes in memory value caused by external radiation or electrical noise. Decreases in computing feature sizes and power usages and shorting the microcycle period enhance the influence of soft errors. Self-stabilizing systems are designed to be started in an arbitrary, possibly a corrupted, state due to, say, soft errors, and to converge to a desired behavior. Self-stabilization is defined by the state space of the components and is essentially a well-founded, clearly defined form of the terms self-healing, automatic-recovery, automatic-repair, and autonomic-computing. To implement a self-stabilizing system, one needs to ensure that the microprocessor that executes the program is self-stabilizing. A self-stabilizing microprocessor copes with any combination of soft errors, converging to perform fetch-decode-execute in fault-free periods. Still, it is important that the microprocessor will avoid convergence periods if possible by masking the effect of soft errors immediately. In this work, we present design schemes for a self-stabilizing microprocessor and a new technique for analyzing the effect of soft errors. Previous schemes for analyzing the effect of soft errors were based on simulations. In contrast, our scheme computes a lower bound on microprocessor reliability and enables the microprocessor designer to evaluate the reliability of the design and to identify reliability bottlenecks. When analyzing the resiliency of digital circuits to soft errors, we examine the logical masking, i.e., errors in internal nodes of the circuits that are masked later by the computation. We show that the problem of computing the reliability of a circuit such that logical masking is taken into account is an NP--hard problem.

[1] S. Dolev, Self-Stabilization. MIT Press, 2000.
[2] S. Dolev and Y. Haviv, “Self-Stabilizing Soft Error Resilient Microprocessor,” Proc. 17th Int'l Conf. Architecture of Computing Systems (ARCS '04), 2004.
[3] S. Dolev and T. Herman, “Dijkstra's Self-Stabilizing Algorithms in Unsuportive Environments,” Proc. Fifth Workshop Self-Stabilizing Systems (WSS '01), pp. 67-81, 2001.
[4] U. Feige, D. Peleg, P. Raghavan, and E. Upfal, “Computing with Unreliable Information,” Proc. 21st Ann. ACM Symp. Theory of Computing (STOC '90), 1990.
[5] A. Fox and D. Patterson, “Self-Repairing Computers,” Scientific Am., June 2003.
[6] C.N. Hadjicostis, Coding Approaches to Fault Tolerance in Combinational and Dynamic Systems. The Hague, The Netherlands: Kluwer Academic, 2002.
[7] J.L. Hennessey and D.A. Patterson, Computer Architecture: A Quantitative Approach. San Mateo, Calif.: Morgan Kaufmann, 2002.
[8] iRoC Tech nologies, “White Paper on VDSM IC Logic and Memory Signal Integrity and Soft Errors,” 30 Jan. 2002, http://www. iroctech.com/pdfwhite_paper_nanometer.pdf .
[9] M. Kistler, P. Shivakumar, L. Alvisi, D. Burger, and S. Keckler, “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic,” Proc. Int'l Conf. Dependable Systems and Networks (ICDSN), pp. 216-226, 2002.
[10] K.L. Parag, Self-Checking and Fault-Tolerant Digital Design. Morgan Kaufmann, 2001.
[11] F. Lima, S. Rezgui, L. Carro, R. Velazco, and R. Reis, “On the Use of VHDL Simulation and Emulation to Derive Error Rates,” Proc. Radiation Effects on Components and Systems Conf. (RADECS), 2001.
[12] P.C. Murley and G.R. Srinivasan, “Soft-Error Monte Carlo Modeling Program, SEMM,” IBM J. Research and Development, vol. 40, no. 1, pp. 109-118, 1996.
[13] J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, C.E. Shannon and J. McCarthy, eds., pp. 329-378, Princeton, N.J.: Princeton Univ. Press, 1956.
[14] E. Normand, “Single Event Upset at Ground Level,” IEEE Trans. Nuclear Science, vol. 43, pp. 2742-2751, 1996.
[15] D. Patterson, “Recovery Oriented Computing,” http:/roc.cs. berkeley.edu/, 2002.
[16] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, Calif.: Morgan Kaufmann, 1988.
[17] N. Pippenger, “On Networks of Noisy Gates,” Proc. 26th IEEE Symp. Foundations of Computer Science, pp. 30-36, 1985.
[18] N. Pippenger, “Analysis of Error Correction by Majority Voting,” Advances in Computing Research, vol. 5, pp. 171-198, JAI Press, 1989.
[19] S.K. Reinhardt and S.S. Mukherjee, “Transient Fault Detection via Simultaneous Multithreading,” Proc. Int'l Symp. Computer Architecture (ISCA), pp. 25-36, 2000.
[20] E. Rotenberg, “AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors,” Proc. Symp. Fault-Tolerant Computing, pp. 84-91, 1999.
[21] A. Tanenbaum, Structured Computer Organization, second ed. Prentice-Hall, Inc., 1984.

Index Terms:
Self-stabilization, microprocessor, soft errors, single event upset.
Citation:
Shlomi Dolev, Yinnon A. Haviv, "Self-Stabilizing Microprocessor: Analyzing and Overcoming Soft Errors," IEEE Transactions on Computers, vol. 55, no. 4, pp. 385-399, April 2006, doi:10.1109/TC.2006.61
Usage of this product signifies your acceptance of the Terms of Use.