Subscribe
Issue No.01 - January (2011 vol.60)
pp: 20-34
Somnath Paul , Case Western Reserve University, Cleveland
Fang Cai , Case Western Reserve University, Cleveland
Xinmiao Zhang , Case Western Reserve University, Cleveland
Swarup Bhunia , Case Western Reserve University, Cleveland
ABSTRACT
With increasing parameter variations in nanometer technologies, on-chip cache in processor is becoming highly vulnerable to runtime failures induced by “soft error,” voltage, or thermal noise and aging effects. Nondeterministic and unreliable memory operation due to these runtime failures can be addressed by: 1) designing the memory for worst-case scenarios and/or 2) runtime error detection and correction. Worst-case guard-banding can lead to overly pessimistic results for cell footprint and power. On the other hand, conventional error correcting code (ECC) used in processor cache has very limited correction capability, making it insufficient to protect memory in scaled technologies (sub-45 nm), which are vulnerable to multiple-bit failures in a word (64-bit). The requirement to tolerate multibit failures is accentuated with supply voltage scaling for low-power operation. We note that due to inter and intra-die parameter variations, different memory blocks move to different reliability corners. A uniform ECC protection for all memory blocks fails to account for the distribution of vulnerability across memory blocks. On the other hand, it can lead to overly pessimistic results if the worst-case vulnerability of a memory block is accounted for during ECC allocation. In this paper, we propose a reliability-driven ECC allocation scheme that matches the relative vulnerability of a memory block (determined using postfabrication characterization) with appropriate ECC protection. We achieve postfabrication variable ECC allocation by storing the check bits in the “ways” of an associative cache. We use shortened Bose-Chaudhuri-Hocquenghem (BCH) cyclic code with zero padding, which provides high random error correction capability with modest amount of check bits. Moreover, we propose efficient circuit/architecture-level optimizations of the ECC encoding/decoding logic to minimize the impact on area, performance, and energy. Simulation results for SPEC2000 benchmarks show that such a variable ECC scheme tolerates high failure rates with negligible performance (four percent) and area (0.2 percent) penalty.
INDEX TERMS
Cache, runtime failures, soft error, process variation, variable ECC allocation.
CITATION
Somnath Paul, Fang Cai, Xinmiao Zhang, Swarup Bhunia, "Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache", IEEE Transactions on Computers, vol.60, no. 1, pp. 20-34, January 2011, doi:10.1109/TC.2010.203
REFERENCES
 [1] A. Agarwal, B.C. Paul, H. Mahmoodi, A. Datta, and K. Roy, "A Process-Tolerant Cache Architecture for Improved Yield in Nanoscale Technologies," IEEE Trans. Very Large Scale Integration Systems, vol. 13, no. 1, pp. 27-38, Jan. 2005. [2] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of Failure Probability and Statistical Design of SRAM Array for Yield Enhancement in Nanoscaled CMOS," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 12, pp. 1859-1880, Dec. 2005. [3] Z. Chisti, A.R. Alameldeen, C. Wilkerson, W. Wu, and S. Lu, "Improving Cache Lifetime Reliability at Ultra-Low Voltages," Proc. Int'l Symp. Microarchitecture, 2009. [4] C. Constantinescu, "Trends and Challenges in VLSI Circuit Reliability," IEEE Micro, vol. 23, no. 4, pp. 14-19, July/Aug. 2003. [5] M. Agostinelli et al., "Erratic Fluctuations of SRAM Cache $V_{min}$ at the 90nm Process Technology Node," Proc. Electron Devices Meeting, 2005. [6] C.W. Slayman, "Cache and Memoryc Error Detection, Correction, and Reduction Techniques for Terrestrial Servers and Workstations," IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp. 397-404, Sept. 2005. [7] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, "Characterization of Multi-Bit Soft Error Events in Advanced Srams," Proc. Int'l Electron Devices Meeting, 2003. [8] K. Osada, K. Yamaguchi, and Y. Saitoh, "SRAM Immunity to Cosmic-Ray-Induced Multierrors Based on Analysis of an Induced Parasitic Bipolar Effect," IEEE J. Solid-State Circuits, vol. 39, no. 5, pp. 827-833, May 2004. [9] K. Kang, S. Gangwal, S.H. Park, and K. Roy, "NBTI Induced Performance Degradation in Logic and Memory Circuits: How Effectively Can We Approach a Reliability Solution?" Proc. Asia and South Pacific Design Automation Conf., 2008. [10] M. Mutyam et al., "Process Variation-Aware Adaptive Cache Architecture and Management," IEEE Trans. Computers, vol. 58, no. 7, pp. 865-877, July 2009. [11] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J.C. Hoe, "Multi-Bit Error Tolerant Caches Using Two-Dimensional Error Coding," Proc. Int'l Symp. Microarchitecture, 2007. [12] H. Sun, N. Zheng, and T. Zhang, "Leveraging Access Locality for the Efficient Use of Multibit Error-Correcting Codes in L2 Cache," IEEE Trans. Computers, vol. 58, no. 10, pp. 1297-1306, Oct. 2009. [13] S.S. Mukherjee, J. Emer, T. Fossum, and S.K. Reinhardt, "Cache Scrubbing in Microprocessors: Myth or Necessity?," Proc. Int'l Symp. Dependable Computing, 2004. [14] D.M. Kwai et al., "Detection of SRAM Cell Stability by Lowering Array Supply Voltage," Proc. Asian Test Symp., 2000. [15] A. Pavlov et al., "Weak Cell Detection in Deep-Submicron SRAMs: A Programmable Detection Technique," IEEE J. Solid-State Circuits, vol. 41, no. 10, pp. 2334-2343, Oct. 2006. [16] S. Mukhopadhyay, K. Kim, H. Mahmoodi, and K. Roy, "Design of a Process Variation Tolerant Self-Repairing SRAM for Yield Enhancement in Nanoscaled CMOS," IEEE J. Solid-State Circuits, vol. 42, no. 6, pp. 1370-1382, June 2007. [17] Predictive Technology Models (PTM), http://www.eas.asu.edu/ptm/modelcard/LP45nm LP.pm , 2010. [18] S. Rusu, H. Muljono, and B. Cherkauer, "Itanium 2 Processor 6M: Higher Frequency and Larger L3 Cache," Proc. Int'l Symp. Microarchitecture, 2004. [19] J.L. Shin, B. Petrick, M. Singh, and A.S. Leon, "Design and Implementation of an Embedded 512-KB Level-2 Cache Subsystem," IEEE J. Solid State Circuits, vol. 40, no. 9, pp. 1815-1820, Sept. 2005. [20] SPEC CPU 2000 Benchmarks, http://www.spec.orgcpu2000/, 2010. [21] Simplescalar Toolset V3.0, http:/www.simplescalar.com/, 2010. [22] H. Li et al., "DCG: Deterministic Clock-Gating for Low-Power Microprocessor Design," IEEE Trans. Very Large Scale Integration, vol. 12, no. 3, pp. 245-254, Mar. 2004. [23] Wattch v1.02d, http://www.eecs.harvard.edu/dbrooks wattch-form.html , 2010. [24] Cacti v4.1, http://www.hpl.hp.com/researchcacti/, 2010. [25] M. Mutyam and V. Narayanan, "Working with Process Variation Aware Caches," Proc. Int'l Conf. Design, Automation and Test in Europe, 2007. [26] R. Pasko et al., "A New Algorithm for Elimination of Common Subexpressions," IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, vol. 18, no. 1, pp. 58-68, Jan. 1999. [27] N. Quach, "High Availability and Reliability in the Itanium Processor," IEEE Micro, vol. 20, no. 5, pp. 61-69, Sept./Oct. 2000. [28] P. Hazucha and C. Svensson, "Impact of CMOS Technology Scaling on the Atmospheric Neutron Soft Error Rate," IEEE Trans. Nuclear Science, vol. 47, no. 6, pp. 2586-2594, Dec. 2000. [29] J. Keane et al., "Method for Qcrit Measurement in Bulk CMOS Using a Switched Capacitor Circuit," Proc. VLSI Symp., 2007. [30] S. Lin and D. Costello, Error Control Coding, second ed. Prentice Hall, 2004. [31] I.S. Reed and M.T. Shih, "VLSI Design of Inverse-free Berlekamp-Massey Algorithm," Computers and Digital Techniques, IEE Proc., vol. 138, no. 5, pp. 295-298, 1991. [32] F. Sun, K. Rose, T. Zhang, "On the Use of Strong bch Codes for Improving Multilevel Nand Flash Memory Storage Capacity," Proc. IEEE Workshop Signal Processing Systems, 2006. [33] B. Alorda, G. Torrens, S. Bota, and J. Segura, "Static and Dynamic Stability Improvement Strategies for 6T CMOS Low-power SRAMs," Proc. Design, Automation and Test in Europe Conf. and Exhibition (DATE), 2010.