This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability
December 2005 (vol. 54 no. 12)
pp. 1547-1555
Wei Zhang, IEEE
Soft error conscious cache design has become increasingly crucial for reliable computing. The widely used ECC or parity-based integrity checking techniques have only limited capability in error detection and correction, while incurring nontrivial penalty in area or performance. The N modular redundancy (NMR) scheme is too costly for processors with stringent cost constraints. This paper proposes a cost-effective solution to enhance data reliability significantly with minimum impact on performance. The idea is to add a small fully associative cache to store the replica of every write to the L1 data cache. Due to data locality and its full associativity, the replication cache can be kept small while providing replicas for a significant fraction of read hits in L1, which can be used to enhance data integrity against soft errors. Our experiments show that a replication cache with eight blocks can provide replicas for 97.3 percent of read hits in L1 on average. Moreover, compared with the recently proposed in-cache replication schemes, the replication cache is more energy efficient, while improving the data integrity against soft errors significantly.

[1] J. Karlsson, P. Ledan, P. Dahlgren, and R. Johansson, “Using Heavy-Ion Radiation to Validate Fault Handling Mechanisms,” IEEE Micro, vol. 14, no. 1, pp. 8-23, Feb. 1994.
[2] J. Sosnowski, “Transient Fault Tolerance in Digital Systems,” IEEE Micro, vol. 14, no. 1, pp. 24-35, Feb. 1994.
[3] S. Kim and A. Somani, “Area Efficient Architectures for Information Integrity Checking in Cache Memories,” Proc. Int'l Symp. Computer Architecture, pp. 246-256, May 1999.
[4] P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi, “Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic,” Proc. Int'l Conf. Dependable Systems and Networks, June 2002.
[5] J. Ray, J.C. Hoe, and B. Falsafi, “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery,” Proc. MICRO, Dec. 2001.
[6] P. Sweazey, “SRAM Organization, Control, and Speed, and Their Effect on Cache Memory Design,” Proc. Midcon/87, pp. 434-437, Sept. 1987.
[7] H. Imai, Essentials of Error-Control Coding Techniques. San Diego, Calif.: Academic Press, 1990.
[8] C.L. Chen and M.Y Hsiao, “Error-Correcting Codes for Semiconductor Memory Applications: A State of the Art Review,” Reliable Computer Systems— Design and Evaluation, pp. 771-786, Digital Press, second ed., 1992.
[9] W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam, “ICR: In-Cache Replication for Enhancing Data Cache Reliability,” Proc. Int'l Conf. Dependable Service and Networks (DSN), 2003.
[10] M. Hamada and E. Fujiwara, “A Class of Error Control Codes for Byte Organized Memory System-Sbec-(Sb+S)ED Codes,” IEEE Trans. Computers, vol. 46, no. 1, pp. 105-110, Jan. 1997.
[11] S. Park and B. Bose, “Burst Asymmetric/Unidirectional Error Correcting/Detecting Codes,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 273-280, June 1990.
[12] E. Rotenburg, “AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors,” Proc. 29th Int'l Symp. Fault-Tolerant Computing Systems, June 1999.
[13] S.K. Reinhardt and S.S. Mukherjee, “Transient Fault Detection via Simultaneous Multithreading,” Proc. 27th Int'l Symp. Computer Architecture, June 2000.
[14] T. Austin, “DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design,” Proc. 32nd Int'l Symp. Microarchitecture, Nov. 1999.
[15] V. Degalahal, N. Vijaykrishnan, and M.J Irwin, “Analyzing Soft Errors in Leakage Optimized SRAM Design,” Proc. VLSI Design Conf., Jan. 2003.
[16] “Understanding Soft and Firm Errors in Semiconductor Devices,” Actel white paper, 2002.
[17] W. Zhang, “Enhancing Data Cache Reliability by the Addition of a Small Fully-Associative Replication Cache,” Proc. 18th Ann. ACM Int'l Conf. Supercomputing, June 2004.
[18] A.J. Smith, “Cache Memories,” Computing Surveys, vol. 14, no. 3, Sept. 1982.
[19] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1995.
[20] T.F. Chen and J.L. Baer, “Reducing Memory Latency via Non-Blocking and Prefetching Caches,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1992.
[21] P.P. Chu and R. Gottipati, “Write Buffer Design for On-Chip Cache,” Proc. Int'l Conf. Computer Design, pp. 311-316, 1994.
[22] N.P. Jouppi, “Cache Write Policies and Performance,” Proc. Int'l Symp. Computer Architecture (ISCA), 1993.
[23] T.C. Mowry, “Tolerating Latency through Software-Controlled Data Prefetching,” PhD thesis, Stanford Computer Systems Laboratory, Mar. 1994.
[24] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Audition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. Int'l Symp. Computer Architecture (ISCA), 1990.
[25] http:/www.simplescalar.com, 2005.
[26] http:/www.spec.org, 2005.
[27] AMD Athlon 64 FX Processor Data Sheet, 2004.
[28] P. Shivakumar and N. Jouppi, “CACTI 3.0: An Integrated Cache Timing, Power and Area Model,” WRL Research Report, 2001.
[29] J. Kin, M. Gupta, and W.H. Mangione-Smith, “The Filter Cache: An Energy Efficient Memory Structure,” Proc. MICRO, 1997.

Index Terms:
Index Terms- Soft error, write-back cache, in-cache replication.
Citation:
Wei Zhang, "Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability," IEEE Transactions on Computers, vol. 54, no. 12, pp. 1547-1555, Dec. 2005, doi:10.1109/TC.2005.202
Usage of this product signifies your acceptance of the Terms of Use.