CSDL Home IEEE Transactions on Dependable and Secure Computing 2009 vol.6 Issue No.03 - July-September
Issue No.03 - July-September (2009 vol.6)
Oguz Ergin , TOBB University of Economy and Technology, Ankara
Osman S. Unsal , Barcelona Supercomputing Center, Barcelona
Xavier Vera , Intel Labs - UPC, Barcelona
Antonio González , Intel Labs - UPC, Barcelona
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TDSC.2008.18
Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.
Memory structures-reliability, testing and fault tolerance, soft errors, narrow values.
Oguz Ergin, Osman S. Unsal, Xavier Vera, Antonio González, "Reducing Soft Errors through Operand Width Aware Policies", IEEE Transactions on Dependable and Secure Computing, vol.6, no. 3, pp. 217-230, July-September 2009, doi:10.1109/TDSC.2008.18