The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - July-September (2009 vol.6)
pp: 217-230
Oguz Ergin , TOBB University of Economy and Technology, Ankara
Osman S. Unsal , Barcelona Supercomputing Center, Barcelona
Xavier Vera , Intel Labs - UPC, Barcelona
Antonio González , Intel Labs - UPC, Barcelona
ABSTRACT
Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.
INDEX TERMS
Memory structures-reliability, testing and fault tolerance, soft errors, narrow values.
CITATION
Oguz Ergin, Osman S. Unsal, Xavier Vera, Antonio González, "Reducing Soft Errors through Operand Width Aware Policies", IEEE Transactions on Dependable and Secure Computing, vol.6, no. 3, pp. 217-230, July-September 2009, doi:10.1109/TDSC.2008.18
REFERENCES
[1] G-H. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli, “Balancing Performance and Reliability in the Cache Hierarchy,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS), 2005.
[2] H. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli, “Reducing Cache Susceptibility to Soft Errors,” IEEE Trans. Dependable and Secure Computing, vol. 3, no. 4, Oct.-Dec. 2006.
[3] H. Asadi, V. Sridharan, M.B. Tahoori, and D. Kaeli, “Reliability Tradeoffs in Design of Cache Memories,” Proc. Workshop Architectural Reliability (WAR), 2005.
[4] T.M. Austin, “DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 1999.
[5] R. Baumann, “Soft Errors in Advanced Computer Systems,” IEEE Design and Test of Computers, 2005.
[6] A. Biswas, P. Racunas, R. Cheveresan, J. Emer, S.S. Mukherjee, and R. Rangan, “Computing Architectural Vulnerability Factors for Address-Based Structures,” Proc. Int'l Symp. Computer Architecture (ISCA), 2005.
[7] D. Brooks and M. Martonosi, “Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance,” Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), 1999.
[8] BSIM 4.5.0 Manual, http://www-device.eecs.berkeley.edu/~bsim3 bsim4.html, 2008.
[9] R. Canal, A. González, and J.E. Smith, “Very Low Power Pipelines using Significance Compression,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 2000.
[10] R. Canal, A. González, and J.E. Smith, “Software-Controlled Operand Gating,” Proc. Int'l Symp. Code Generation and Optimization (CGO), 2004.
[11] Y. Cao and H. Yasuura, “A System-Level Energy Minimization Approach Using Datapath Width Optimization,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED), 2001.
[12] K. Constantinides, S. Plaza, J. Blome, B. Zhang, V. Bertacco, S. Mahlke, T. Austin, and M. Orshansky, “Assessing SEU Vulnerability via Circuit-Level Timing Analysis,” Proc. Workshop Architectural Reliability (WAR), 2005.
[13] O. Ergin, D. Balkan, D. Ponomarev, and K. Ghose, “Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 2004.
[14] O. Ergin, O. Unsal, X. Vera, and A. González, “Exploiting Narrow Values for Soft Error Tolerance,” IEEE Computer Architecture Letters (CAL '06), vol. 5, 2006.
[15] M.A. Gomaa and T.N. Vijaykumar, “Opportunistic Transient-Fault Detection,” Proc. Int'l Symp. Computer Architecture (ISCA), 2005.
[16] M.A. Gomaa, C. Scarbrough, T.N. Vijaykumar, and I. Pomeranz, “Transient-Fault Recovery for Chip Multiprocessors,” Proc. Int'l Symp. Computer Architecture (ISCA), 2003.
[17] R. Gonzalez, A. Cristal, D. Ortega, A. Veidenbaum, and M. Valero, “A Content Aware Register File Organization,” Proc. Int'l Symp. Computer Architecture (ISCA), 2004.
[18] R. Gonzalez, A. Cristal, M. Pericas, M. Valero, and A. Veidenbaum, “An Asymmetric Clustered Processor Based on Value Content,” Proc. Ann. Int'l Conf. Supercomputing (ICS), 2005.
[19] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., vol. Q1, 2001.
[20] J. Hu, S. Wang, and S.G. Ziavras, “In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2006.
[21] M. Kondo and H. Nakamura, “A Small, Fast and Low-Power Register File by Bit-Partitioning,” Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), 2005.
[22] S. Kumar and A. Aggarwal, “Optimum Resource Allocation for Concurrent Error Detection Techniques in High Performance Processors,” Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), 2006.
[23] S. Kumar, S.L. Kuo, and C.Y. Yip, Fast Parity Generator Using Complement Pass-Transistor Logic, US Patent 5608741.
[24] L. Li, V.S. Degalahal, N. Vijaykrishnan, M. Kandemir, and M.J. Irwin, “Soft Error and Energy Consumption Interactions: A Data Cache Perspective,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED), 2004.
[25] X. Li, S.V. Adve, P. Bose, and J.A. Rivers, “SoftArch: An Architecture-Level Tool for Modeling and Analyzing Soft Errors,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2005.
[26] M. Lipasti, B.R. Mestan, and E. Gunadi, “Physical Register Inlining,” Proc. Int'l Symp. Computer Architecture (ISCA), 2004.
[27] G. Loh, “Exploiting Data-Width Locality to Increase Superscalar Execution Bandwidth,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 2002.
[28] G. Memik, M.T. Kandemir, and O. Ozturk, “Increasing Register File Immunity to Transient Errors,” Proc. Design, Automation and Test in Europe (DATE), 2005.
[29] S.S. Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, and T. Austin, “A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 2003.
[30] S.S. Mukherjee, M. Kontz, and S.K. Reinhardt, “Detailed Design and Evaluation of Redundant Multithreading Alternatives,” Proc. Int'l Symp. Computer Architecture (ISCA), 2002.
[31] T. Nakra, B.R. Childers, and M.L. Soffa, “Width Sensitive Scheduling for Resource Constrained VLIW Processors,” Proc. Workshop Feedback Directed and Dynamic Optimizations (FDDO), 2001.
[32] D. Pham et al., “The Design and Implementation of a First-Generation Cell Processor,” Proc. Int'l Solid-State Circuits Conf. (ISSCC), 2005.
[33] R. Phelan, Addressing Soft Errors in ARM Core-Based Designs, White Paper, ARM, Dec. 2003.
[34] D. Ponomarev, G. Kucuk, O. Ergin, and K. Ghose, “Energy Efficient Comparators for Superscalar Datapaths,” IEEE Trans. Computers, vol. 53, no. 7, pp. 892-904, July 2004.
[35] M.K. Qureshi, O. Mutlu, and Y.N. Patt, “Microarchitecture-Based Introspection: A Technique for Transient Fault Tolerance in Microprocessors,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2005.
[36] S.K. Reinhardt and S.S. Mukherjee, “Transient Fault Detection via Simultaneous Multithreading,” Proc. Int'l Symp. Computer Architecture (ISCA), 2000.
[37] G.A. Reis, J. Chang, N. Vachharajani, S.S. Mukherjee, R. Rangan, and D.I August, “Design and Evaluation of Hybrid Fault-Detection Systems,” Proc. Int'l Symp. Computer Architecture (ISCA), 2005.
[38] G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I August, “SWIFT: Software Implemented Fault Tolerance,” Proc. Int'l Symp. Code Generation and Optimization (CGO), 2005.
[39] E. Rotenberg, “AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors,” Proc. 29th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS '99), pp. 84-91, June 1999.
[40] T. Sato and I. Arita, “Table Size Reduction for Data Value Predictors by Exploiting Narrow Width Values,” Proc. Ann. Int'l Conf. Supercomputing (ICS), 2000.
[41] Semiconductors Industry Assoc. (SIA), Int'l Technology Roadmap for Semiconductors 2005, http://www.itrs.net/Links/2005ITRSHome2005.htm , 2008.
[42] K.C. Smolens, J. Kim, J.C. Hoe, and B. Falsafi, “Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 2004.
[43] O. Unsal, O. Ergin, X. Vera, and A. Gonzalez, “Empowering a Helper Cluster through Data Width Aware Instruction Steering Policies,” Proc. 20th Int'l Parallel and Distributed Processing Symp. (IPDPS '06), Apr. 2006.
[44] T.N. Vijaykumar, I. Pomeranz, and K. Cheng, “Transient-Fault Recovery Using Simultaneous Multithreading,” Proc. Int'l Symp. Computer Architecture (ISCA), 2002.
[45] L. Villa, M. Zhang, and K. Asanovic, “Dynamic Zero Compression for Cache Energy Reduction,” Proc. Ann. Int'l Symp. Microarchitecture (MICRO), 2000.
[46] N. Wang and S.J. Patel, “ReStore: Symptom Based Soft Error Detection in Microprocessors,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2005.
[47] N. Wang, J. Quek, T.M. Rafacz, and S.J. Patel, “Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2004.
[48] C. Weaver, J. Emer, S.S. Mukherjee, and S.K. Reinhardt, “Techniques to Reduce the Soft Errors Rate in a High-Performance Microprocessor,” Proc. Int'l Symp. Computer Architecture (ISCA), 2004.
[49] W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam, “ICR: In-Cache Replication for Enhancing Data Cache Reliability,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2003.
[50] R. Zimmermann and W. Fichtner, “Low-Power Logic Styles: CMOS versus Pass-Transistor Logic,” IEEE J. Solid-State Circuits, vol. 32, no. 7, 1997.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool