Issue No. 04 - October-December (2006 vol. 3)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TDSC.2006.55
Data caches are a fundamental component of most modern microprocessors. They provide for efficient read/write access to data memory. Errors occurring in the data cache can corrupt data values or state, and can easily propagate throughout the memory hierarchy. One of the main threats to data cache reliability is soft (transient, nonreproducible) errors. These errors can occur more often than hard (permanent) errors, and most often arise from single event upsets (SEUs) caused by strikes from energetic particles such as neutrons and alpha particles. Many protection techniques exist for data caches; the most common are ECC (error correcting codes) and parity. These protection techniques detect all single bit errors and, in the case of ECC, correct them. To make proper design decisions about which protection technique to use, accurate design-time modeling of cache reliability is crucial. In addition, as caches increase in storage capacity, another important goal is to reduce the failure rate of a cache, to limit disruption to normal system operation. In this paper, we present our modeling approach for assessing the impact of soft errors using architectural simulators. We also describe a new technique for reducing the vulnerability of data caches: refetching. By selectively refetching cache lines from the ECC-protected L2 cache, we can significantly reduce the vulnerability of the L1 data cache. We discuss and present results for two different algorithms that perform selective refetch. Experimental results show that we can obtain an 85 percent decrease in vulnerability when running the SPEC2K benchmark suite while only experiencing a slight decrease in performance. Our results demonstrate that selective refetch can cost-effectivety decrease the error rate of an L1 data cache
cache storage, error correction codes, fault tolerance, memory architecture, storage management,data cache susceptibility, soft error, microprocessor, data memory, data cache reliability, error correcting code, parity, architectural simulator, cache line refetching, fault tolerance,Protection, Error correction codes, Microprocessors, Read-write memory, Single event upset, Single event transient, Neutrons, Alpha particles, Error correction, Cache storage,Fault tolerance, reliability, soft errors, error modeling, cache memories, refresh, refetch.
"Reducing Data Cache Susceptibility to Soft Errors", IEEE Transactions on Dependable and Secure Computing, vol. 3, no. , pp. 353-364, October-December 2006, doi:10.1109/TDSC.2006.55