2006 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2006)
Seattle, WA, USA
Sept. 16, 2006 to Sept. 20, 2006
DOI Bookmark: http://doi.ieeecomputersociety.org/
Sumeet Kumar , Electrical and Computer Engineering, Binghamton University, Binghamton, NY
Aneesh Aggarwal , Electrical and Computer Engineering, Binghamton University, Binghamton, NY
With reducing feature size, increasing chip capacity, and increasing clock speed, microprocessors are becoming increasingly susceptible to transient (soft) errors. Redundant multi-threading (RMT) is an attractive approach for concurrent error detection. However, redundant thread execution has a significant impact on performance and energy consumption in the chip. In this paper, we propose reducing instruction redundancy (the instructions that are redundantly executed) as a means to mitigate the performance and energy impact of redundancy. In this paper, we experiment with an decoupled RMT approach where the frontend pipeline stages are protected through error codes, while the backend pipeline stages are protected through redundant execution. In this approach, we define two categories of instructions — self-checking and semi self-checking instructions. Self checking instructions are those instructions whose results are checked for any errors when their "main" copies are executed. These instructions are not redundantly executed. Semi self-checking instructions are those instructions for which a major part of their results is checked when the "main" copies are executed, and the remaining part of the instructions is checked using a small amount of additional hardware. Reducing instruction redundancy with this approach has the same fault coverage as the base architecture where all the instructions are redundantly executed. The techniques are evaluated in terms of their performance, power, and vulnerability impact on the RMT processor. Our experiments show that the techniques reduce instruction redundancy by about 58% and recover about 51% of the performance lost due to redundant execution. Our techniques also recover about 40% of the energy consumption increase in the key data-path structures.
Redundant Multi-threading, Concurrent Error Detection, Reducing Instruction Redundancy, Self-checking Instructions
S. Kumar and A. Aggarwal, "Self-checking instructions — reducing instruction redundancy for concurrent error detection," 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT), Seattle, WA, USA, 2006, pp. 64-73.