Issue No.01 - January-June (2006 vol.5)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/L-CA.2006.1
This paper makes a case for using multi-core processors to simultaneously achieve transient-fault tolerance and performance enhancement. Our approach is extended from a recent latency-tolerance proposal, dual-core execution (DCE). In DCE, a program is executed twice in two processors, named the front and back processors. The front processor pre-processes instructions in a very fast yet highly accurate way and the back processor re-executes the instruction stream retired from the front processor. The front processor runs faster as it has nocorrectness constraints whereas its results, including timely prefetching and prompt branch misprediction resolution, help the back processor make faster progress. In this paper, we propose to entrust the speculative results of the front processor and use them to check the un-speculative results of the back processor. A discrepancy, either due to a transient fault or a mispeculation, is then handled with the existing mispeculation recovery mechanism. In this way, both transient-fault tolerance and performance improvement can be delivered simultaneously with little hardware overhead.
Huiyang Zhou, "A Case for Fault Tolerance and Performance Enhancement Using Chip Multi-Processors", IEEE Computer Architecture Letters, vol.5, no. 1, pp. 22-25, January-June 2006, doi:10.1109/L-CA.2006.1