This Article 
 Bibliographic References 
 Add to: 
Microarchitectural Online Testing for Failure Detection in Memory Order Buffers
May 2010 (vol. 59 no. 5)
pp. 623-637
Javier Carretero, Intel Barcelona Research Center, Intel Labs, Barcelona
Xavier Vera, Intel Barcelona Research Center, Intel Labs, Barcelona
Pedro Chaparro, Intel Barcelona Research Center, Intel Labs, Barcelona
Jaume Abella, Intel Barcelona Research Center, Intel Labs, Barcelona
Technology scaling leads to burn-in phase out and higher postsilicon test complexity, which increases in-the-field failure rate due to both latent defects and actual errors, respectively. As a consequence, current reliability qualification methods will likely be infeasible. Microarchitecture knowledge of application runtime behavior offers a possibility to have low-cost continuous online testing techniques detect hard errors in the field. Whereas data can be protected with redundancy (like parity or ECC), there is a lack of mechanism for control logic. This paper proposes a microarchitectural approach for validating that the memory order buffer logic works correctly. Our design relies on a small cache-like structure that keeps track of the last store to each cached address. Each load is checked to have obtained the data from the youngest older producing store. We present three different implementations of this idea, offering different trade-offs for error coverage, performance overhead, and design complexity.

[1] R. Baumann, "Soft Errors in Advanced Computer Systems," IEEE Design and Test of Computers, vol. 22, no. 3, pp. 258-266, May/June 2005.
[2] P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi, "Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic," Proc. Int'l Conf. Dependable Systems and Networks (DSN '02), pp. 389-398, 2002.
[3] S. Kundu, T. Mak, and R. Galivanche, "Trends in Manufacturing Test Methods and Their Implications," Proc. Int'l Test Conf. (ITC '04), pp. 679-687, 2004.
[4] T. Barnett, A. Singh, and V. Nelson, "Extending Integrated-Circuit Yield-Models to Estimate Early-Life Reliability," IEEE Trans. Reliability, vol. 52, no. 3, pp. 296-300, Sept. 2003.
[5] Y.-T. Hsing, C.-W. Wang, C.-W. Wu, C.-T. Huang, and C.-W. Wu, "Failure Factor Based Yield Enhancement for SRAM Designs," Proc. 19th Int'l Symp. Defect and Fault Tolerance in VLSI Systems (DFT '04), pp. 20-28, 2004.
[6] R. Rajsuman, "RAMbist Builder: A Methodology for Automatic Built-In Self-Test Design of Embedded RAMs," Proc. Int'l Workshop Memory Technology, Design and Testing (MTDT '96), pp. 50-56, 1996.
[7] R. Aitken, "Nanometer Technology Effects on Fault Models for IC Testing," Computer, vol. 32, no. 11, pp. 46-51, Nov. 1999.
[8] M. Favalli and C. Metra, "Online Testing Approach for Very Deep-Submicron ICs," IEEE Design and Test of Computers, vol. 19, no. 2, pp. 16-23, Mar./Apr. 2002.
[9] S. Sethumadhavan, R. Desikan, D. Burger, C. Moore, and S. Keckler, "Scalable Hardware Memory Disambiguation for High ILP Processors," Proc. Int'l Symp. Microarchitecture (MICRO-36), 2003.
[10] I. Park, C. Ooi, and T. Vijaykumar, "Reducing Design Complexity of the Load/Store Queue," Proc. Int'l Symp. Microarchitecture (MICRO-36), 2003.
[11] J. Srinivasan, S. Adve, P. Bose, and J. Rivers, "Lifetime Reliability: Toward an Architectural Solution," IEEE Micro, vol. 25, no. 3, pp. 70-80, May/June 2005.
[12] E. Rotenberg, "AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors," Proc. Ann. Int'l Symp. Fault-Tolerant Computing, pp. 84-91, 1999.
[13] S. Reinhardt and S. Mukherjee, "Transient Fault Detection via Simultaneous Multithreading," Proc. 27th Int'l Symp. Computer Architecture (ISCA), 2000.
[14] M. Franklin and G. Sohi, "ARB: A Hardware Mechanism for Dynamic Reordering of Memory References," IEEE Trans. Computers, vol. 45, no. 5, pp. 552-571, May 1996.
[15] J. Smith and G. Sohi, "The Microarchitecture of Superscalar Processors," Proc. IEEE, vol. 83, no. 12, pp. 1609-1624, Dec. 1995.
[16] P. Bardell and W. McAnney, "Self-Testing of Multichip Logic Modules," Proc. IEEE Int'l Test Conf., pp. 200-204, 1982.
[17] B. Konemann, G. Zwiehoff, and J. Mucha, "Built-In Test for Complex Digital Integrated Circuits," IEEE J. Solid-State Circuits, vol. 15, no. 3 pp. 315-319, June 1980.
[18] R. Frohwerk, "Signature Analysis: A New Digital Field Service Method," Hewlett-Packard J., vol. 39, no. 11, pp. 2-8, May. 1977.
[19] L.-T. Wang, C.-W. Wu, and X. Wen, VLSI Test Principles and Architectures: Design for Testability (Systems on Silicon). Morgan Kaufmann Publishers, 2006.
[20] D. Wu, M. Lin, M. Reddy, T. Jaber, A. Sabbavarapu, and L. Thatcher, "An Optimized DFT and Test Pattern Generation Strategy for an Intel High Performance Microprocessor," Proc. Int'l Test Conf. (ITC '04), pp. 38-47, 2004.
[21] R. Rajsuman, "Design and Test of Large Embedded Memories: An Overview," IEEE Design and Test of Computers, vol. 18, no. 3, pp. 16-27, May 2001.
[22] R. Lyons and W. Vanderkulk, "The Use of Triple-Modular Redundancy to Improve Computer Reliability," IBM J., pp. 200-209, 1962.
[23] D. Siewiorek and R. Swarz, Reliable Computer Systems: Design and Evaluation. A. K. Peters, Ltd., 1998.
[24] W. Bartlett and L. Spainhower, "Commercial Fault Tolerance: A Tale of Two Systems," IEEE Trans. Dependable and Secure Computing, vol. 1, no. 1, pp. 87-96, Jan.-Mar. 2004.
[25] T. Austin, "DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design," Proc. Int'l Symp. Microarchitecture (MICRO-32), 1999.
[26] S. Mukherjee, M. Kontz, and S. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA), 2002.
[27] P. Parvathala, K. Maneparambil, and W. Lindsay, "FRITS: A Microprocessor Functional BIST Method," Proc. Int'l Test Conf. (ITC '02), p. 590, 2002.
[28] J. Smolens, B. Gold, J. Hoe, B. Falsafi, and K. Mai, "Detecting Emerging Wearout Faults," Proc. Third Workshop Silicon Errors in Logic—System Effects (SELSE '07), 2007.
[29] K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco, "Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation," Proc. 40th Int'l Symp. Microarchitecture (MICRO-40), pp. 97-108, 2007.
[30] S. Shyam, K. Constantinides, S. Phadke, V. Bertacco, and T. Austin, "Ultra Low-Cost Defect Protection for Microprocessor Pipelines," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII), pp. 73-82, 2006.
[31] P. Lala, Self-Checking and Fault-Tolerant Digital Design. Morgan Kaufmann Publishers, 2001.
[32] A. Mahmood and E. McCluskey, "Concurrent Error Detection Using Watchdog Processors—A Survey," IEEE Trans. Computers, vol. 37, no. 2, pp. 160-174, Feb. 1988.
[33] V. Iyengar and L. Kinney, "Concurrent Fault Detection in Microprogrammed Control Units," IEEE Trans. Computers, vol. 34, no. 9, pp. 810-821, Sept. 1985.
[34] M. Namjoo, "Techniques for Concurrent Testing of VLSI Processor Operation," Proc. IEEE Int'l Test Conf., pp. 461-468, 1982.
[35] A. Krasniewski, "Concurrent Error Detection for Finite State Machines Implemented with Embedded Memory Blocks of SRAM-Based FPGAs," Microprocessors and Microsystems, vol. 32, nos. 5/6, pp. 303-312, 2008.
[36] J. Carretero, X. Vera, P. Chaparro, and J. Abella, "On-Line Failure Detection in Memory Order Buffers," Proc. IEEE Int'l Test Conf. (ITC '08), 2008.
[37] S. Adve and K. Gharachorloo, "Shared Memory Consistency Models: A Tutorial," Computer, vol. 29, no. 12, pp. 66-76, Dec. 1996.
[38] S.M., Inc., "The SPARC Architecture Manual," Technical Report 800-199-12, version 8, Sun Microsystems, Inc., 1991.
[39] I. Corporation, "Intel64 Architecture Memory Ordering," White Paper, 2007.
[40] A. Parashar, A. Sivasubramaniam, and S. Gurumurthi, "Slick: Slice-Based Locality Exploitation for Efficient Redundant Multithreading," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 95-105, 2006.
[41] D. Tarjan, S. Thoziyoor, and N. Jouppi, "CACTI 4.0," HP Technical Report HPL-2006-86, 2006.
[42] A. González, M. Valero, N. Topham, and J. Parcerisa, "Eliminating Cache Conflict Misses through XOR-Based Placement Functions," Proc. Int'l Conf. Supercomputing (ICS '97), 1997.
[43] B. Goeman, H. Vandierendonck, and K. de Bosschere, "Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency," Proc. Seventh Int'l Symp. High-Performance Architecture, pp. 207-216, 2001.

Index Terms:
Online testing, memory order buffer, control logic, microarchitecture, error detection, soft errors, defects.
Javier Carretero, Xavier Vera, Pedro Chaparro, Jaume Abella, "Microarchitectural Online Testing for Failure Detection in Memory Order Buffers," IEEE Transactions on Computers, vol. 59, no. 5, pp. 623-637, May 2010, doi:10.1109/TC.2009.139
Usage of this product signifies your acceptance of the Terms of Use.