The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2009 vol.58)
pp: 1171-1184
Shuai Wang , New Jersey Institute of Technology, Newark
Jie Hu , New Jersey Institute of Technology, Newark
Sotirios G. Ziavras , New Jersey Institute of Technology, Newark
ABSTRACT
Soft errors induced by energetic particle strikes in on-chip cache memories have become an increasing challenge in designing new generation reliable microprocessors. Previous efforts have exploited information redundancy via parity/ECC codings or cacheline duplication for information integrity in on-chip cache memories. Due to various performance, area/size, and energy constraints in various target systems, many existing unoptimized protection schemes may eventually prove significantly inadequate and ineffective. In this paper, we propose a new framework for conducting comprehensive studies and characterization on the reliability behavior of cache memories, in order to provide insight into cache vulnerability to soft errors as well as design guidance to architects for highly efficient reliable on-chip cache memory design. Our work is based on the development of new lifetime models for data and tag arrays residing in both the data and instruction caches. Those models facilitate the characterization of cache vulnerability of stored items at various lifetime phases. We then exemplify this design methodology by proposing reliability schemes targeting at specific vulnerable phases. Benchmarking is carried out to showcase the effectiveness of our approach.
INDEX TERMS
Cache, reliability, soft error, temporal vulnerability factor.
CITATION
Shuai Wang, Jie Hu, Sotirios G. Ziavras, "On the Characterization and Optimization of On-Chip Cache Reliability against Soft Errors", IEEE Transactions on Computers, vol.58, no. 9, pp. 1171-1184, September 2009, doi:10.1109/TC.2009.33
REFERENCES
[1] S. Wang, J. Hu, and S.G. Ziavras, “On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors,” Proc. Sixth Int'l Conf. Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2006), pp. 14-20, July 2006.
[2] J.F. Ziegler et al., “IBM Experiments in Soft Fails in Computer Electronics (1978-1994),” IBM J. Research and Development, vol. 40, no. 1, pp. 3-18, Jan. 1996.
[3] C. Weaver et al., “Techniques to Reduce the Soft Errors Rate in a High-Performance Microprocessor,” Proc. 31st Ann. Int'l Symp. Computer Architecture, 2004.
[4] P. Shivakumar et al., “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic,” Proc. Int'l Conf. Dependable Systems and Networks, pp. 389-398, June 2002.
[5] S. Kim and A. Somani, “Area Efficient Architectures for Information Integrity Checking in Cache Memories,” Proc. Int'l Symp. Computer Architecture, pp. 246-255, May 1999.
[6] R. Phelan, “Addressing Soft Errors in ARM Core-Based Soc,” ARM white paper, ARM Ltd., Dec. 2003.
[7] L. Li et al., “Soft Error and Energy Consumption Interactions: A Data Cache Perspective,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 132-137, 2004.
[8] W. Zhang, “Computing Cache Vulnerability to Transient Errors and Its Implication,” Proc. 20th IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, Oct. 2005.
[9] V. Sridharan, H. Asadi, M.B. Tahoori, and D. Kaeli, “Reducing Data Cache Susceptibility to Soft Errors,” IEEE Trans. Dependable and Secure Computing, vol. 3, no. 4, pp. 353-364, Oct.-Dec. 2006.
[10] H. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli, “Vulnerability Analysis of L2 Cache Elements to Single Event Upsets,” Proc. Conf. Design, Automation, and Test in Europe, Mar. 2006.
[11] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J.C. Hoe, “Multi-Bit Error Tolerant Caches Using Two-Dimensional Error Coding,” Proc. 40th IEEE/ACM Int'l Symp. Microarchitecture, pp. 197-209, Dec. 2007.
[12] N.N. Sadler and D.J. Sorin, “Choosing an Error Protection Scheme for a Microprocessor L1 Data Cache,” Proc. Int'l Conf. Computer Design, Oct. 2006.
[13] V. Degalahal, L. Li, V. Narayanan, M. Kandemir, and M.J. Irwin, “Soft Errors Issues in Low-Power Caches,” IEEE Trans. Very Large Scale Integration Systems, vol. 13, no. 10, pp. 1157-1166, Oct. 2005.
[14] W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam, “Icr: In-Cache Replication for Enhancing Data Cache Reliability,” Proc. Int'l Conf. Dependable Systems and Networks, 2003.
[15] G. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli, “Balancing Reliability and Performance in the Memory Hierarchy,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, Mar. 2005.
[16] J. Yan and W. Zhang, “Evaluating Instruction Cache Vulnerability to Transient Errors,” ACM SIGARCH Computer Architecture News, vol. 35, no. 4, pp. 21-28, Sept. 2007.
[17] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. Int'l Symp. Computer Architecture, 2001.
[18] D. Brooks and M. Martonosi, “Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance,” Proc. Fifth Int'l Symp. High Performance Computer Architecture, Jan. 1999.
[19] O. Ergin et al., “Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure,” Proc. 37th Ann. Int'l Symp. Microarchitecture, pp. 304-315, 2004.
[20] G.H. Loh, “Exploiting Data-Width Locality to Increase Superscalar Execution Bandwidth,” Proc. 35th Ann. IEEE/ACM Int'l Symp. Microarchitecture, 2002.
[21] M.H. Lipasti et al., “Physical Register Inlining,” Proc. 31st Ann. Int'l Symp. Computer Architecture, pp. 325-335, June 2004.
[22] S. Wang, H. Yang, J. Hu, and S.G. Ziavras, “Asymmetrically Banked Value-Aware Register Files,” Proc. IEEE CS Ann. Symp. Very Large Scale Integration, pp. 363-368, 2007.
[23] J. Hu, S. Wang, and S.G. Ziavras, “In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability,” Proc. Int'l Conf. Dependable Systems and Networks, pp.281-290, June 2006.
[24] O. Ergin, O. Unsal, X. Vera, and A. Gonzalez, “Exploiting Narrow Values for Soft Error Tolerance,” IEEE Computer Architecture Letters, vol. 5, no. 2, p. 12, July-Dec. 2006.
[25] A. Aggarwal and M. Franklin, “Energy Efficient Asymmetrically Ported Register Files,” Proc. IEEE Int'l Conf. Computer Design, pp.2-7, 2003.
[26] M. Kondo and H. Nakamura, “A Small, Fast and Low-Power Register File by Bit-Partitioning,” Proc. 11th Int'l Symp. High-Performance Computer Architecture, pp. 40-49, 2005.
[27] S. Wang, H. Yang, J. Hu, and S.G. Ziavras, “Asymmetrically Banked Value-Aware Register Files for Low Energy and High Performance,” Microprocessors and Microsystems, vol. 32, no. 3, pp.171-182, May 2008.
[28] O. Ergin, “Exploiting Narrow Values for Energy Efficiency in the Register Files of Superscalar Microprocessors,” Proc. 16th Int'l Workshop Power and Timing Modeling, Optimization and Simulation, pp. 477-485, 2006.
[29] J.-C. Lo, “Fault-Tolerant Content Addressable Memory,” Proc. 1993 Int'l Conf. Computer Design, pp. 193-196, 1993.
[30] F. Salice, M. Sami, and R. Stefanelli, “Fault-Tolerant Cam Architectures: A Design Framework,” Proc. 17th IEEE Int'l Symp. Defect and Fault-Tolerance in VLSI Systems, pp. 233-244, 2002.
[31] A. Biswas et al., “Computing Architectural Vulnerability Factors for Address-Based Structures,” Proc. IEEE Int'l Symp. Computer Architecture, June 2005.
[32] D. Burger, A. Kagi, and M.S. Hrishikesh, “Memory Hierarchy Extensions to Simplescalar 3.0,” Technical Report TR99-25, Dept. of Computer Sciences, The Univ. of Texas at Austin, 2000.
[33] P. Shivakumar and N. Jouppi, “Cacti 3.0: An Integrated Cache Timing, Power, and Area Model,” technical report, Compaq Western Research Lab, 2001.
[34] Spec cpu2000 v1.3, http://www.spec.org/cpu2000, 2009.
[35] T. Sherwood et al., “Automatically Characterizing Large Scale Program Behavior,” Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 2002.
[36] S.S. Mukherjee, C.T. Weaver, J. Emer, S.K. Reinhardt, and T. Austin, “A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor,” Proc. 36th Ann. IEEE/ACM Int'l Symp. Microarchitecture, Dec. 2003.
[37] P. Pujara and A. Aggarwal, “Increasing the Cache Efficiency by Eliminating Noise,” Proc. 12th Int'l Symp. High-Performance Computer Architecture, Feb. 2006.
[38] L. Villa, M. Zhang, and K. Asanovic, “Dynamic Zero Compression for Cache Energy Reduction,” Proc. 33rd Ann. Int'l Symp. Microarchitecture, pp. 214-220, 2000.
[39] N.S. Kim, T. Austin, and T. Mudge, “Low-Energy Data Cache Using Sign Compression and Cache Line Bisection,” Proc. Workshop Memory Performance Issues, 2002.
[40] S.S. Mukherjee, J. Emer, T. Fossum, and S.K. Reinhardt, “Cache Scrubbing in Microprocessors: Myth or Necessity,” Proc. 10th Int'l Symp. Pacific Rim Dependable Computing, Mar. 2004.
[41] A.M. Saleh, J.J. Serrano, and J.H. Patel, “Reliability of Scrubbing Recovery-Techniques for Memory Systems,” IEEE Trans. Reliability, vol. 39, no. 1, pp. 114-122, Apr. 1990.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool