Issue No.03 - May-June (2008 vol.25)
Published by the IEEE Computer Society
Rob Aitken , ARM
Erik Jan Marinissen , NXP Semiconductors
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MDT.2008.67
The cost of silicon debug can be considerable and unpredictable. Problems can range from catastrophic to subtle. Once errors have been observed, debug takes over. When problems arise, the first challenge is to categorize them. Causes can range from incorrect specifications to silicon defects, to measurement errors. This special issue focuses on all aspects of a successful debug process: how to prepare, what to do during debug, and how to use the results to improve things in the future. The seven articles in this issue cover a broad variety of topics in silicon debug and diagnosis, as well as the newly emerging middle ground between the two: at-speed timing failures.
Designing, implementing, fabricating, and testing an IC is a complex, expensive undertaking. Large design teams, months or even years of effort, and millions of dollars are involved. Ideally, everything goes perfectly, and the end result is working silicon. However, industry surveys show that more than 70% of all IC designs require one or more respins. These occur despite large amounts of resources devoted to validation and verification at every step: simulation, equivalence checking, emulation, and timing analysis for the design; rule checking and design for manufacturability for the circuits; and dedicated teams devoted to yield improvement in the fab. All of these activities are important, and all lead to improved quality of design and products, yet IC complexity still brings problems, especially in early silicon.
The cost of silicon debug can be considerable and unpredictable. Silicon failures can result in "fire drills," in which everyone with any potentially relevant information is drafted to help find out what is happening. Expensive equipment must be purchased or rented. People from multiple geographical locations and possibly multiple companies are brought together to solve the problem. At stake is the volume ramp of a product—the costs of being late to market are high, and companies are anxious to avoid them. The worst aspect of the debug fire drill is the uncertainty about when it will end. Will the bug be found and fixed quickly, or will the search drag on for weeks or months?
Addressing the debug challenge requires a multifaceted approach. Including debug and diagnosis features early in the design process can pay off later. Having the right equipment and data collection methodology in place and ready for first silicon is also helpful. Once silicon is available, following a disciplined strategy is key. Care needs to be taken to ensure that the universe of possible problems is methodically reduced to the true root cause.
Problems can range from catastrophic (zero yield) to subtle (erroneous results for very specific actions under very specific operating conditions). Once errors have been observed, debug takes over. When problems arise, the first challenge is to categorize them. Causes can range from incorrect specifications to silicon defects, to measurement errors.
This special issue focuses on all aspects of a successful debug process: how to prepare, what to do during debug, and how to use the results to improve things in the future. The seven articles in this issue cover a broad variety of topics in silicon debug (finding and fixing design errors) and diagnosis (finding manufacturing defects), as well as the newly emerging middle ground between the two: at-speed timing failures, which are often the result of a combination of weak points in a design and silicon abnormalities.
The first article, "Functional Debug Techniques for Embedded Systems," by Bart Vermeulen (NXP Semiconductors), focuses on using a combination of run-stop debugging and real-time tracing to provide observability in systems where there is a problem. Specialized hardware to improve accessibility to a design is the focus of "In-System Silicon Validation and Debug," by Miron Abramovici (DAFCA). Adding such design-for-debug hardware up front can simplify the challenging task of determining whether a system is working as intended when silicon comes back from the fab.
Next, in "Case Study on Speed Failure Causes in a Microprocessor," Kip Killpack et al. (Intel and the University of California, Santa Barbara) discuss the causes of at-speed failures in microprocessors. This article addresses the relative importance of design issues such as crosstalk and IR drop compared with defect issues in observed speed-path failures. This article provides a bridge from debug to diagnosis. The next article, "Linking Statistical Learning to Diagnosis," by Pouria Bastani, Li-C. Wang, and Magdy Abadir (University of California, Santa Barbara and Freescale Semiconductor), continues the transition and looks at methods for categorizing the results of diagnosis correctly as random defects or systematic problems.
A question that quickly arises in practical applications of diagnosis is, "What happens if the DFT infrastructure is broken?" The topic is a fertile area for research, and this special issue includes an article that addresses this area: "Survey of Scan Chain Diagnosis," by Yu Huang et al. (Mentor Graphics and National Taiwan University). Diagnosis techniques are not all electrical or based on test infrastructure. In addition, physical observation of failures is important. Just as printing subwavelength features is a challenging part of manufacturing, so is observing defects of similar dimensions equally challenging. Christian Boit et al. (Technical University of Berlin and DCG Systems) present "Physical Techniques for Chip-Backside IC Debug in Nanotechnologies," which is a survey of these challenges along with current and proposed solutions.
With the disaggregated IC industry, every IC design touches many companies, from EDA and IP to manufacturing, packaging, and test. With so many participants, standardization is becoming increasingly more important. The final article in the special issue, "Overview of Debug Standardization Activities," is an update on debug-related standardization activities by Bart Vermeulen and other key members of the Nexus 5001, MIPI Test and Debug, IEEE P1149.7, IEEE P1687, and OCP-IP Debug working groups.
We hope you enjoy this special issue. In addition to informing you about the latest progress in this area, this special issue hopefully will encourage you to address these challenges in your own organization and to become a part of the growing community of debug professionals. There are many ways to participate, including at debug-specific events such as the annual Silicon Debug and Diagnosis Workshop. Together, we can face these challenges so that debug can continue to become as structured and disciplined in hardware design as it is in software design.
Rob Aitken is an R&D Fellow at ARM. His research interests include DFT, fault diagnosis, low-power design, and design for manufacturability. He has a BSc and an MSc in computer science from the University of Victoria, Canada, and a PhD in electrical engineering from McGill University, Canada. He is a senior member of the IEEE.
Erik Jan Marinissen is a senior principal scientist at NXP Semiconductors. His research interests include all aspects of VLSI test and DFT. He has an MSc and a PDEng in computing science from Eindhoven University of Technology. He is a senior member of the IEEE.