Analysis and Randomized Design of Algorithm-Based Fault Tolerant Multiprocessor Systems Under an Extended Model
Issue No. 07 - July (1997 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.598349
<p><b>Abstract</b>—Reliability of compute-intensive applications can be improved by introducing fault tolerance into the system. Algorithm-based fault tolerance (ABFT) is a low-cost scheme which provides the required fault tolerance to the system through system level encoding. In this paper, we propose randomized construction techniques, under an extended model, for the design of ABFT systems with the required fault tolerance capability. The model considers failures in the processors performing the checking operations.</p>
Algorithm-based fault tolerance, concurrent error detection, concurrent fault location, randomized algorithms, fault diagnosis, transient faults.
N. K. Jha and S. Yajnik, "Analysis and Randomized Design of Algorithm-Based Fault Tolerant Multiprocessor Systems Under an Extended Model," in IEEE Transactions on Parallel & Distributed Systems, vol. 8, no. , pp. 757-768, 1997.