This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Reducing Area Overhead for Error-Protecting Large L2/L3 Caches
March 2009 (vol. 58 no. 3)
pp. 300-310
Soontae Kim, University of South Florida, Tampa
Due to increasing concern about various errors, current processors adopt error protection mechanisms for their on-chip components. Especially, protecting caches in current processors incur as much as 12.5% area overhead due to error correcting codes. Considering large L2/L3 caches employed in current high-performance processors, the area overhead is very high and consumes a large number of on-chip transistors. As an attempt to reduce that overhead, this paper proposes an area-efficient error protection architecture for large L2/L3 caches. First, it selectively applies ECC (Error Correcting Code) to only dirty cache lines and other clean cache lines are protected by using simple parity check codes. Second, the dirty cache lines are periodically cleaned by exploiting the generational behavior of cache lines in order not to increase traffic to off-chip main memory. Experimental results show that the cleaning technique effectively reduces the number of dirty cache lines per cycle. The ECCs of the reduced dirty cache lines can be confined in a small ECC array or ECC cache. Our proposed error-protection architecture has been shown to reduce the area overhead of a 1MB L2 cache for error protection by 59% with less than 1% performance degradation, on the average, using SPEC2000 benchmarks running on a typical four-issue superscalar processor.

[1] S. Bokar, “Design Challenges of Technology Scaling,” IEEE Micro, July/Aug. 1999.
[2] P. Hazucha and C. Svensson, “Impact of CMOS Technology Scaling on the Atmospheric Neutron Soft Error Rate,” IEEE Trans. Nuclear Science, vol. 47, no. 6, Dec. 2000.
[3] T. Karnik, B. Bloechel, K. Soumyanath, V. De, and S. Bokar, “Scaling Trends of Cosmic Ray Induced Soft Errors in Static Latches Beyond 0.18 $\mu$ m,” Digest of Technical Papers of Symp. VLSI Circuits, 2001.
[4] N. Seifert, D. Moyer, N. Leland, and R. Hokinson, “Historical Trend in Alpha-Particle Induced Soft Error Rates of the Alpha Microprocessor,” IEEE Trans. VLSI, vol. 9, no. 1, 2001.
[5] C. Constantinescu, “Trends and Challenges in VLSI Circuit Reliability,” IEEE Micro, July/Aug. 2003.
[6] R. Baumann, “The Impact of Technology Scaling on Soft Error Rate Performance and Limits to the Efficacy of Error Correction,” Proc. Int'l Electron Devices Meeting (IEDM), 2002.
[7] P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L. Alvisi, “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2002.
[8] D.C. Bossen, J.M. Tendler, and K. Reick, “POWER4 System Design for High Reliability,” IEEE Micro, Mar./Apr. 2002.
[9] N. Quach, “High Availability and Reliability in the Itanium Processor,” IEEE Micro, Sept./Oct. 2000.
[10] R. Phelan, Addressing Soft Errors in ARM Core-Based SoC. ARM, Dec. 2003.
[11] K.C. Yeager, “The Mips R10000 Microprocessor,” IEEE Micro, Apr. 1996.
[12] Compaq Computer Corporation, Data Integrity for Compaq Non-Stop Himalaya Servers, http:/nonstop.compaq.com, 1999.
[13] T.J. Slegel et al., “IBM's S/390 G5 Microprocessor Design,” IEEE Micro, p. 1223, Mar./Apr. 1999.
[14] J. Ray, J. Hoe, and B. Falsafi, “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery,” Proc. 34th ACM/IEEE Int'l Symp. Microarchitecture (MICRO '01), Dec. 2001.
[15] S. Reinhardt and S. Mukherjee, “Transient Fault Detection via Simultaneous Multithreading,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), June 2000.
[16] K. Skadron and D.W. Clark, “Design Issues and Tradeoffs for Write Buffers,” Proc. Third Int'l Symp. High-Performance Computer Architecture (HPCA), 1997.
[17] H.S. Lee, G.S. Tyson, and M.K. Farrens, “Eager Writeback—A Technique for Improving Bandwidth Utilization,” Proc. 33rd Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), 2000.
[18] J. Dean, J. Hicks et al., “Profileme: Hardware Support for Instruction Level Profiling on Out-of-Order Processors,” Proc. 30th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), 1997.
[19] S. Kim, and A.K. Somani, “Area Efficient Architectures for Information Integrity in Cache Memories,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), 1999.
[20] W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam, “ICR: In-Cache Replication for Enhancing Data Cache Reliability,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2003.
[21] C. Kim, D. Bouger, and S.W. Keckler, “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2002.
[22] W. Zhang, M. Kandemir, A. Sivasubramaniam, and M.J. Irwin, “Performance, Energy, and Reliability Tradeoffs in Replicating Hot Cache Lines,” Proc. Int'l Conf. Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2003.
[23] W. Zhang, “Enhancing Data Cache Reliability by the Addition of a Small Fully-Associative Replication Cache,” Proc. 18th Ann. ACM Int'l Conf. Supercomputing (ICS), 2004.
[24] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, “Characterization of Multi-Bit Soft Error Events in Advanced SRAMs,” Proc. Int'l Electron Devices Meeting (IEDM), 2003.
[25] L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M.J. Irwin, “Soft Error and Energy Consumption Interactions: A Data Cache Perspective,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED), 2004.
[26] S.S. Mukherjee, J. Emer, T. Fossum, and S.K. Reinhardt, “Cache Scrubbing in Microprocessors: Myth or Necessity?” Proc. 10th IEEE Pacific Rim Int'l Symp. Dependable Computing (PRDC), 2004.
[27] S.S. Mukherjee, J. Emer, and S.K. Reinhardt, “The Soft Error Problem: An Architectural Perspective,” Proc. 11th Int'l Symp. High-Performance Computer Architecture (HPCA), 2005.
[28] H. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli, “Vulnerability Analysis of L2 Cache Elements to Single Event Upsets,” Proc. Design Automation and Test in Europe Conf. (DATE), 2006.
[29] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Leakage Power,” Proc. 28th Ann. Int'l Symp. Computer Architecture (ISCA), 2001.
[30] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Q1, 2001.
[31] SPEC2000 Benchmarks, http://www.specbench.org/osgcpu2000/, 2008.
[32] SPEC2000 Binaries, http://www.eecs.umich.edu/~chriswea/benchmarks spec2000.html, 2006.
[33] D. Burger and T.M. Austin, “The Simplescalar Tool Set,” Technical Report 1342, Computer Sciences Dept., Univ. of Wisconsin, 1997.

Index Terms:
SRAM, Cache memories, Error-checking
Citation:
Soontae Kim, "Reducing Area Overhead for Error-Protecting Large L2/L3 Caches," IEEE Transactions on Computers, vol. 58, no. 3, pp. 300-310, March 2009, doi:10.1109/TC.2008.174
Usage of this product signifies your acceptance of the Terms of Use.