This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore Chip
September 2011 (vol. 60 no. 9)
pp. 1246-1259
Saeed Shamshiri, UCSB, Santa Barbara
Kwang-Ting (Tim) Cheng, University of California, Santa Barbara, Santa Barbara
It becomes increasingly difficult to achieve a high manufacturing yield for multicore chips due to larger chip sizes, higher device densities, and greater failure rates. By adding a limited number of spare cores and wires to replace defective cores and wires either before shipment or in the field, the effective yield of the chip and its overall cost can be significantly improved. In this paper, we first model the yield of a multicore chip that incorporates both spare cores and spare wires. Then, we propose a quality metric for an NoC, and model the system yield subject to a given quality constraint. We also model the manufacturing and service costs of a multicore chip and show that a spare scheme can significantly improve the quality, increase the yield, reduce the overall cost, and substitute for the burn-in process. We illustrate that, in a spare-enhance system on a chip with high-quality in-field recovery capability, the reliance on high quality manufacturing testing can be significantly reduced. We also demonstrate that the overall quality of a mesh-based NoC depends more on the reliability of the inner links than the outer links; therefore, nonuniform spare wire distribution is sometimes more effective and cost efficient than a uniform approach.

[1] L. Hammond, B.A. Nayfeh, and K. Olukotun, “A Single-Chip Multiprocessor,” Computer, vol. 30, no. 9, pp. 79-85, Sept. 1997.
[2] M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki, “Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, vol. 26, no. 2, pp. 10-24, Mar./Apr. 2006.
[3] R. Kumar, D.M. Tullsen, N.P. Jouppi, and P. Ranganathan, “Heterogeneous Chip Multiprocessors,” Computer, vol. 38, no. 11, pp. 32-38, Nov. 2005.
[4] D. Pham et al., “The Design and Implementation of a First-Generation CELL Processor—A Multi-Core SoC,” Proc. Int'l Conf. Integrated Circuit Design and Technology, pp. 49-52, 2005.
[5] S. Vangal et al., “An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS,” Proc. IEEE Int'l Solid-State Circuits Conf., pp. 98-589, 2007.
[6] T. Hsieh, K. Lee, and M.A. Breuer, “An Error-Oriented Test Methodology to Improve Yield with Error-Tolerance,” Proc. IEEE 24th VLSI Test Symp., pp. 130-135, 2006.
[7] Int'l Technology Roadmap for Semiconductors, http://www.itrs.net/Links/2006Update2006UpdateFinal.htm . 2006.
[8] I. Koren and Z. Koren, “Defect Tolerance in VLSI Circuits: Techniques and Yield Analysis,” Proc. IEEE, vol. 86, no. 9, pp. 1819-1837, Sept. 1998.
[9] R.T. Smith, J.D. Chlipala, J.F.M. Bindels, R.G. Nelson, F.H. Fischer, and T.F. Mantz, “Laser Programmable Redundancy and Yield Improvement in a 64K DRAM,” IEEE J. Solid-State Circuits, vol. 16, no. 5, pp. 506-514, Oct. 1981.
[10] J.H. Kim and S.M. Reddy, “On the Design of Fault-Tolerant Two-Dimensional Systolic Arrays for Yield Enhancement,” IEEE Trans. Computers, vol. 38, no. 4, pp. 515-525, Apr. 1989.
[11] F. Hatori et al., “Introducing Redundancy in Field Programmable Gate Arrays,” Proc. IEEE Custom Integrated Circuits Conf., pp. 7.1.1-7.1.4, 1993.
[12] I. Kim, Y. Zorian, G. Komoriya, H. Pham, F.P. Higgins, and J.L. Lewandowski, “Built in Self Repair for Embedded High Density SRAM,” Proc. Int'l Test Conf., pp. 1112-1119, 1998.
[13] S. Makar, T. Altinis, N. Patkar, and J. Wu, “Testing of Vega2, a Chip Multi-Processor with Spare Processors,” Proc. IEEE Int'l Test Conf., pp. 1-10, 2007.
[14] S. Shamshiri, P. Lisherness, S.-J. Pan, and K.-T. (Tim) Cheng, “A Cost Analysis Framework for Multi-Core Systems with Spares,” Proc. IEEE Int'l Test Conf. (ITC), pp. 1-8, 2008.
[15] S. Shamshiri and K.-T. (Tim) Cheng, “Yield and Cost Analysis of a Reliable NoC,” Proc. IEEE 27th VLSI Test Symp. (VTS '09), pp. 173-178, 2009.
[16] G. De Micheli and L. Benini, Networks on Chips. Morgan Kaufmann Publishers, 2006.
[17] T. Lehtonen, P. Liljeberg, and J. Plosila, “Self-Timed NoC Links Using Combinations of Fault Tolerance Methods,” Proc. IEEE Design Automation and Test in Europe, 2007.
[18] M.C. Neuenhahn, D. Lemmer, H. Blume, and T.G. Noll, “Quantitative Cost Modeling of Error Protection for Network-on-Chip,” Proc. ProRISK Workshop, pp. 331-337, 2007.
[19] Y. Jiao, Y. Yang, M. He, M. Yang, and Y. Jiang, “Multi-Path Routing for Mesh/Torus-Based NoCs,” Proc. Fourth Int'l Conf. Information Technology, (ITNG '07), pp. 734-742, 2007.
[20] M. Gao, H.-M. Chang, P. Lisherness, and K.-T. (Tim) Cheng, “Time-Multiplexed Online Checking: A Feasibility Study,” Proc. Asian Test Symp. (ATS '08), pp. 371-376, 2008.
[21] A. Krstic, W.-C. Lai, L. Chen, K.-T. (Tim) Cheng, and S. Dey, “Embedded Software-Based Self-Testing for SoC Design,” Proc. Design Automation Conf., pp. 355-360, 2002.
[22] L. Chen and S. Dey, “Software-Based Self-Testing Methodology for Processor Cores,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 3, pp. 369-380, Mar. 2001.
[23] N. Kranitis, G. Xenoulis, A. Paschalis, D. Gizopoulos, and Y. Zorian, “Application and Analysis of RT-Level Software-Based Self-Testing for Embedded Processor Cores,” Proc. Int'l Test Conf. (ITC), pp. 431-440, 2003.
[24] M. Nicolaidis and Y. Zorian, “On-Line Testing for VLSI—A Compendium of Approaches,” J. Electronic Testing, vol. 12, nos. 1/2, pp. 7-20, 1998.
[25] H. Al-Asaad, B.T. Murray, and J.P. Hayes, “Online BIST for Embedded Systems,” IEEE Design & Test of Computers, vol. 15, no. 4, pp. 17-24, Oct.-Dec. 1998.
[26] M.A. Breuer and A.A. Ismaeel, “Roving Emulation as a Fault Detection Mechanism,” IEEE Trans. Computers, vol. 35, no. 11, pp. 933-939, Nov. 1986.
[27] A.W. Righter, C.F. Hawkins, J.M. Soden, and P.C. Maxwell, “CMOS IC Reliability Indicators and Burn-in Economics,” Proc. Int'l Test Conf., pp. 194-203, 1998.
[28] T.R. Henry and T. Soo, “Burn-in Elimination of a High Volume Microprocessor Using ${\rm I_{DDQ}}$ ,” Proc. Int'l Test Conf., pp. 242-249, 1996.
[29] R. Kawahara, O. Nakayama, and T. Kurasawa, “The Effectiveness of ${\rm I_{DDQ}}$ and High Voltage Stress for Burn-in Elimination CMOS Production,” Proc. IEEE Int'l Workshop IDDQ Testing, pp. 9-13, 1996.
[30] M. Sachdev, “Deep Sub-Micron IDDQ Testing: Issues and Solutions” Proc. European Design and Test Conf., 1997.
[31] K. Roy, T.M. Mak, and K.-T. (Tim) Cheng, “Test Consideration for Nanometer-Scale CMOS Circuits,” IEEE Design & Test of Computers, vol. 23, no. 2, pp. 128-136, Mar.-Apr. 2006.
[32] K.-T. Cheng, S. Dey, M. Rodgers, and K. Roy, “Test Challenges for Deep Sub-Micron Technologies,” Proc. 37th Design Automation Conf., pp. 142-149, 2000.
[33] J.T. De Sousa and V.D. Agrawal, “Reducing the Complexity of Defect Level Modeling Using the Clustering Effect,” Proc. Design, Automation and Test in Europe Conf. and Exhibition, pp. 640-644, 2000.
[34] W. Kuo and T. Kim, “An Overview of Manufacturing Yield and Reliability Modeling for Semiconductor Products,” Proc. IEEE, vol. 87, no. 8, pp. 1329-1344, Aug. 1999.
[35] S. Shamshiri and K.-T. (Tim) Cheng, “Yield and Cost Analysis for Spare-Enhanced Network-on-Chips,” UCSB technical report, http:/cadlab.ece.ucsb.edu, 2008.
[36] J.M. Carulli and T.J. Anderson, “The Impact of Multiple Failure Modes on Estimating Product Field Reliability,” IEEE Design & Test of Computers, vol. 23, no. 2, pp. 118-126, Mar.-Apr. 2006.
[37] V.V. Kumar and J. Lach, “IC Modeling for Yield-Aware Design with Variable Defect Rates,” Proc. Ann. Reliability and Maintainability Symp., pp. 489-495, 2005.
[38] P. Gratz, C. Kim, K. Sankaralingam, H. Hanson, P. Shivakumar, S.W. Keckler, and D. Burger, “On-Chip Interconnection Networks of the TRIPS Chip,” IEEE Micro, vol. 27, no. 5, pp. 41-50, Sept.-Oct. 2007.
[39] S. Shamshiri and K.-T. (Tim) Cheng, “Error-Locality-Aware Linear Coding to Correct Multi-bit Upsets in SRAMs,” Proc. IEEE Int'l Test Conf. (ITC), 2010.
[40] D. Rossi, P. Angelini, and C. Metra, “Configurable Error Control Scheme for NoC Signal Integrity,” Proc. IEEE 13th Int'l On-Line Testing Symp. (IOLTS), 2007.
[41] Q. Yu and P. Ampadu, “A Flexible Parallel Simulator for Networks-on-Chip with Error Control,” IEEE Trans. Computer-Aided Design of Integrated Ciruits and Systems, vol. 29, no. 1, pp. 103-116, Jan. 2010.
[42] J.A. Cunningham, “The Use and Evaluation of Yield Models in Integrated Circuit Manufacturing,” IEEE Trans. Semiconductor Manufacturing, vol. 3, no. 2, pp. 60-71, May 1990.
[43] I. Koren, Z. Koren, and C.H. Stapper, “A Unified Negative-Binomial Distribution for Yield Analysis of Defect-Tolerant Circuits” IEEE Trans. Computers, vol. 42, no. 6, pp. 724-734, June 1993.
[44] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. MIT Press, 2001.
[45] T. Dumitras, S. Kerner, and R. Marculescu, “Towards On-Chip Fault-Tolerant Communication,” Proc. Asia South Pacific Design Automation Conf., pp. 225-232, 2003.
[46] B. Pittel, “On Spreading a Rumor,” SIAM J. Applied Math., vol. 47, no. 1, pp. 213-223, Feb. 1987.

Index Terms:
Fault tolerance, redundant design, reliability, system on a chip, yield and cost modeling.
Citation:
Saeed Shamshiri, Kwang-Ting (Tim) Cheng, "Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore Chip," IEEE Transactions on Computers, vol. 60, no. 9, pp. 1246-1259, Sept. 2011, doi:10.1109/TC.2011.32
Usage of this product signifies your acceptance of the Terms of Use.