The Community for Technology Leaders
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Toronto, ON, Canada
Oct. 25, 2008 to Oct. 29, 2008
ISBN: 978-1-5090-3021-7
pp: 62-71
Gordon B. Bell , IBM Corporation, Research Triangle Park, NC, USA
Mikko H. Lipasti , Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA
ABSTRACT
Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on an additional processor core, however it speeds up (rather than slows down) program execution compared to the unprotected baseline case. It makes the observation that a small number of instructions are detrimental to overall performance, and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits. We highlight the modest incremental hardware required to support skewed redundancy and demonstrate a speedup of 6%/54% for a collection of integer/floating point benchmarks while still providing 100% error detection coverage within our sphere of replication. Additionally, we show that a third core can further improve performance while adding error recovery capabilities.
INDEX TERMS
Instruction sets, Redundancy, Proposals, Cathode ray tubes, Hardware, Transient analysis, Microarchitecture,distributed processing, Error tolerance, memory-level parallelism
CITATION
Gordon B. Bell, Mikko H. Lipasti, "Skewed redundancy", 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 62-71, 2008, doi:
99 ms
(Ver 3.3 (11022016))