This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Maximizing Spare Utilization by Virtually Reorganizing Faulty Cache Lines
January 2011 (vol. 60 no. 1)
pp. 35-49
Amin Ansari, University of Michigan, Ann Arbor
Shantanu Gupta, University of Michigan, Ann Arbor
Shuguang Feng, University of Michigan, Ann Arbor
Scott Mahlke, University of Michigan, Ann Arbor
Aggressive technology scaling to 45 nm and below introduces serious reliability challenges to the design of microprocessors. Since a large fraction of chip area is devoted to on-chip caches, it is important to protect these SRAM structures against lifetime and manufacture-time failures. Designers typically overprovision caches with additional resources to overcome hard faults. However, static allocation and binding of redundant spares results in low utilization of the extra resources and ultimately limits the number of defects that can be tolerated. This work re-examines the design of process-variation-tolerant on-chip caches with a focus on providing the flexibility and dynamic reconfigurability necessary to tolerate large numbers of defects with modest hardware overhead. Our approach, ZerehCache, virtually reorganizes the cache data array using a permutation network to provide more degrees of freedom for spare allocation. A graph coloring algorithm is used to configure the network and identify the proper mapping of replacement elements. We perform an extensive design space exploration of both L1/L2 caches to identify several Pareto-optimal ZerehCaches. Given these optimal design points, we employ ZerehCache to extend the effective lifetime of the on-chip caches and prevent early lifetime failures. Finally, yield analysis studies performed on a population of 1,000 chips at the 45 nm technology node demonstrated that an L1 design with 16 percent overhead and an L2 design with eight percent area overhead achieve yields of 99 percent and 96 percent, respectively.

[1] D. Achlioptas and C. Moore, "The Chromatic Number of Random Regular Graphs," Proc. Eighth Int'l Workshop Randomization and Computation, pp. 219-228, 2004.
[2] D. Achlioptas and A. Naor, "The Two Possible Values of the Chromatic Number of a Random Graph," Proc. 36th ACM Symp. Theory of Computing, pp. 587-593, 2004.
[3] A. Agarwal, B. Paul, S. Mukhopadhyay, and K. Roy, "Process Variation in Embedded Memories: Failure Analysis and Variation Aware Architecture," J. Solid State Circuits, vol. 49, no. 9, pp. 1804-1814, 2005.
[4] A. Agarwal, B.C. Paul, H. Mahmoodi, A. Datta, and K. Roy, "A Process-Tolerant Cache Architecture for Improved Yield in Nanoscale Technologies," IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 13, no. 1, pp. 27-38, Jan. 2005.
[5] F. Aichelmann, "Fault-Tolerant Design Techniques for Semiconductor Memory Applications," IBM J. Research and Development, vol. 28, no. 2, pp. 177-183, 1984.
[6] T. Austin, E. Larson, and D. Ernst, "Simplescalar: An Infrastructure for Computer System Modeling," IEEE Trans. Computers, vol. 35, no. 2, pp. 59-67, Feb. 2002.
[7] B. Berger and J. Rompel, "A Better Performance Guarantee for Approximate Graph Coloring," Algorithmica, vol. 5, no. 3, pp. 459-466, 1990.
[8] S. Bhunia, S. Mukhopadhyay, and K. Roy, "Process Variations and Process-Tolerant Design," Proc. IEEE CS Int'l Conf. Very Large Scale Integration (VLSI) Design, pp. 699-704, 2007.
[9] B. Bollobas, "The Chromatic Number of Random Graphs," Combinatorica, vol. 8, no. 1, pp. 49-55, 1988.
[10] S. Borkar, "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation," IEEE Micro, vol. 25, no. 6, pp. 10-16, Nov./Dec. 2005.
[11] L. Chang, D. Fried, J. Hergenrother, J. Sleight, R. Dennard, R. Montoye, L. Sekaric, S. McNab, A. Topol, C. Adams, K. Guarini, and W. Haensch, "Stable SRAM Cell Design for the 32 nm Node and beyond," Proc. Symp. Very Large Scale Integration (VLSI) Technology, pp. 128-129, June 2005.
[12] G. Chen, D. Blaauw, T. Mudge, D. Sylvester, and N. Kim, "Yield-Driven Near-Threshold SRAM Design," Proc. Int'l Conf. Computer Aided Design, pp. 660-666, Nov. 2007.
[13] M. Franklin and K.K. Saluja, "Built-In Self-Testing of Random-Access Memories," Computer, vol. 23, no. 10, pp. 45-56, Oct. 1990.
[14] R. Hamming, "Error-Detecting and Error-Correcting Codes," The Bell System Technical J., vol. 29, no. 1, pp. 147-160, 1950.
[15] C.W. Hampson, "Redundancy and High-Volume Manufacturing Methods," Intel Technology J., vol. 1, no. 2, 1997.
[16] M. Horiguchi, "Redundancy Techniques for High-Density DRAMS," Proc. Second Ann. IEEE Int'l Conf. Innovative Systems in Silicon, pp. 22-29, 1997.
[17] L.D. Hung, M. Goshima, and S. Sakai, "Seva: A Soft-Error- and Variation-Aware Cache Architecture," Proc. 12th IEEE CS Pacific Rim Int'l Symp. Dependable Computing, pp. 47-54, 2006.
[18] C. Kim, S. Sethumadhavan, M. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S.W. Keckler, "Composable Lightweight Processors," Proc. 40th Ann. Int'l Symp. Microarchitecture, pp. 381-393, Dec. 2007.
[19] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J.C. Hoe, "Multi-Bit Error Tolerant Caches Using Two-Dimensional Error Coding," Proc. 40th Ann. Int'l Symp. Microarchitecture, 2007.
[20] W. Klotz, "Graph Coloring Algorithms," Mathematik-Bericht 5, Clausthal Univ. of Tech nology, 2002.
[21] C.-K. Koh, W.-F. Wong, Y. Chen, and H. Li, "Tolerating Process Variations in Large, Set-Associative Caches: The Buddy Cache," ACM Trans. Architecture and Code Optimization, vol. 6, no. 2, pp. 1-34, 2009.
[22] I. Koren and Z. Koren, "Incorporating Yield Enhancement into the Floorplanning Process," IEEE Trans. Computers, vol. 49, no. 6, pp. 532-541, June 2000.
[23] J.P. Kulkarni, K. Kim, and K. Roy, "A 160 mv, Fully Differential, Robust Schmitt Trigger Based Sub-Threshold SRAM," Proc. Int'l Symp. Low Power Electronics and Design, pp. 171-176, 2007.
[24] J.H. Lee, Y.J. Lee, and Y.B. Kim, "SRAM Word-Oriented Redundancy Methodology Using Built In Self-Repair," Proc. IEEE Int'l SOC Conf. '04, pp. 219-222, 2004.
[25] X. Liang, R. Canal, G.-Y. Wei, and D. Brooks, "Replacing 6t SRAMS with 3t1d DRAMS in the l1 Data Cache to Combat Process Variability," IEEE Micro, vol. 28, no. 1, pp. 60-68, Jan./Feb. 2008.
[26] T. Luczak, "Chromatic Number of Random Graphs," Combinatorica, vol. 11, no. 1, pp. 45-54, 1991.
[27] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of Failure Probability and Statistical Design of SRAM Array for Yield Enhancement in Nanoscale CMOS," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, pp. 1859-1880, 2005.
[28] N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi, "Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with Cacti 6.0," Proc. Ann. IEEE/ACM Int'l Symp. Miroarchitecture (MICRO '07), pp. 3-14, 2007.
[29] D. Nassimi and S. Sahni, "A Self Routing Benes Network," Proc. Seventh Ann. Int'l Symp. Computer Architecture, pp. 190-195, 1980.
[30] S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou, "Yield-Aware Cache Architectures," Proc. 39th Ann. Int'l Symp. Microarchitecture, pp. 15-25, 2006.
[31] D. Roberts, N.S. Kim, and T. Mudge, "On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology," Proc. 10th Euromicro Conf. Digital System Design Architectures, Methods and Tools, pp. 570-578, Aug. 2007.
[32] N. Sadler and D. Sorin, "Choosing an Error Protection Scheme for a Microprocessor's l1 Data Cache," Proc. IEEE Int'l Conf. Computer Design, 2006.
[33] S. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, "Varius: A Model of Process Variation and Resulting Timing Errors for Microarchitects," IEEE Trans. Semiconductor Manufacturing, vol. 21, no. 1, pp. 3-13, Feb. 2008.
[34] K. Sasaki, "A 9-ns 1-mbit CMOS RAM," J. Solid State Circuits, vol. 24, pp. 1219-1225, 1989.
[35] Z. Shi and R. Lee, "Implementation Complexity of Bit Permutation Instructions," Proc. Asilomar Conf. Signals, Systems and Computers, pp. 879-886, Nov. 2003.
[36] J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers, "The Impact of Technology Scaling on Lifetime Reliability," Proc. Int'l Conf. Dependable Systems and Networks, pp. 177-186, June 2004.
[37] K. Takahashi, H. Doi, N. Tamura, K. Mimuro, T. Hashizume, Y. Moriyama, and Y. Okuda, "A 0.9 v Operation 2-Transistor Flash Memory for Embedded Logic LSIs," Proc. Symp. Very Large Scale Integration (VLSI) Technology, pp. 21-22, 1999.
[38] R. Teodorescu and J. Torrellas, "Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors," Proc. 35th Ann. Int'l Symp. Computer Architecture, pp. 363-374, June 2008.
[39] K.M. Thompson, "Intel and the Myths of Test," IEEE J. Design and Test of Computers, vol. 13, no. 1, pp. 79-81, 1996.
[40] N. Verma and A. Chandrakasan, "A 256 kb 65 nm 8t Subthreshold SRAM Employing Sense-Amplifier Redundancy," IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 141-149, Jan. 2008.
[41] A. Wigderson, "Improving the Performance Guarantee for Approximate Graph Coloring," J. ACM, vol. 30, no. 4, pp. 729-735, 1983.
[42] C. Wilkerson, H. Gao, A.R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu, "Trading off Cache Capacity for Reliability to Enable Low Voltage Operation," Proc. 35th Ann. Int'l Symp. Computer Architecture, pp. 203-214, 2008.
[43] X. Yang, M. Vachharajani, and R.B. Lee, "Fast Subword Permutation Instructions Based on Butterfly Networks," Proc. SPIE Conf. Media Processor, pp. 80-86, 2000.

Index Terms:
Process variation, wearout, fault-tolerant cache memories, manufacturing yield.
Citation:
Amin Ansari, Shantanu Gupta, Shuguang Feng, Scott Mahlke, "Maximizing Spare Utilization by Virtually Reorganizing Faulty Cache Lines," IEEE Transactions on Computers, vol. 60, no. 1, pp. 35-49, Jan. 2011, doi:10.1109/TC.2010.204
Usage of this product signifies your acceptance of the Terms of Use.