Noise and radiation-induced soft errors (transient faults) in computer systems have increased significantly over the last few years and are expected to increase even more as we move towards smaller transistor sizes and lower supply voltages. Fault detection and recovery can be achieved through redundancy. The emergence of chip multiprocessors (CMPs) makes it possible to execute redundant threads on a chip and provide relatively low-cost reliability. State-of-the-art implementations execute two copies of the same program as two threads (redundant multithreading), either on the same or on separate processor cores in a CMP, and periodically check results. While this solution has favorable performance and reliability properties, every redundant instruction flows through a high-frequency complex outof- order pipeline, thereby incurring a high power consumption penalty. This paper proposes mechanisms that attempt to provide reliability at a modest power and complexity cost. When executing a redundant thread, the trailing thread benefits from the information produced by the leading thread. We take advantage of this property and comprehensively study different strategies to reduce the power overhead of the trailing core in a CMP. These strategies include dynamic frequency scaling, in-order execution, and parallelization of the trailing thread.
Index Terms:
Reliability, power, transient faults, soft errors, redundant multi-threading (RMT),, heterogeneous chip multiprocessors, dynamic frequency scaling
Citation:
Niti Madan, Rajeev Balasubramonian, "Power Efficient Approaches to Redundant Multithreading," IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 8, pp. 1066-1079, June 2007, doi:10.1109/TPDS.2007.1090