Issue No. 08 - August (1993 vol. 42)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.238483
<p>Parallel processing architectures are commonly used for signal processing and other computationally intensive applications. These applications are characterized by high throughput and long processing periods. Such characteristics decrease the reliability of high-performance architectures. The erroneous data produced by faulty processors could have damaging consequences, particularly in critical real-time applications. It is therefore desirable that any erroneous data produced by the system be detected and located as quickly as possible. Algorithm-based fault tolerance (ABFT) is a low-cost system-level concurrent error detection and fault location scheme. Methods used in the analysis of multiprocessor systems using system-level diagnosis are applied to the analysis of ABFT systems. A new algorithm for analyzing an ABFT system for its fault diagnosability is developed using these methods. Based on this work, a fault diagnosis algorithm is developed for ABFT systems.</p>
diagnosability; parallel processing architectures; diagnosis; algorithm-based fault-tolerant systems; signal processing; faulty processors; concurrent error detection; fault location scheme; multiprocessor systems; system-level diagnosis; fault tolerant computing; parallel processing.
B. Vinnakota and N. Jha, "Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems," in IEEE Transactions on Computers, vol. 42, no. , pp. 924-937, 1993.