Computer Architecture, International Symposium on (2002)
May 25, 2002 to May 29, 2002
Exponential growth in the number of on-chip transistors, coupled with reductions in voltage levels, makes each generation of microprocessors increasingly vulnerable to transient faults. In a multithreaded environment, we can detect these faults by running two copies of the same program as separate threads, feeding them identical inputs, and comparing their outputs, a technique we call Redundant Multithreading (RMT).This paper studies RMT techniques in the context of both single- and dual-processor simultaneous multithreaded (SMT) single-chip devices. Using a detailed, commercial-grade, SMT processor design we uncover subtle RMT implementation complexities, and find that RMT can be a more significant burden for single-processor devices than prior studies indicate. However, a novel application of RMT techniques in a dual-processor device, which we term chip-level redundant threading (CRT), shows higher performance than lockstepping the two cores, especially on multithreaded workloads.
M. Kontz, S. K. Reinhardt and S. S. Mukherjee, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Computer Architecture, International Symposium on(ISCA), Anchorage, Alaska, 2002, pp. 0099.