The Community for Technology Leaders
Green Image
<p>Algorithm-based fault tolerance (ABFT) is a popular approach to achieve fault and error detection in multiprocessor systems. The design problem for ABFT is concerned with the construction of a check set of minimum cardinality that detects a specified number of errors or faults. Previous work on this problem has assumed an a priori bound on the size of a check. We motivate and carry out an investigation of the problem without the bounded check size assumption. We establish upper and lower bounds on the number of checks needed to detect a given number of errors. The upper bounds are obtained through new schemes which are easy to implement, and the lower bounds are established using new types of arguments. These bounds are sharply different from those previously established under the bounded check size model. We also show that unlike error detection, the design problem for fault detection is NP-hard even for detecting only one fault.</p>
multiprocessing systems; computational complexity; error detection; fault tolerant computing; check sets; algorithm-based fault tolerance; error detection; multiprocessor systems; ABFT; check set; minimum cardinality; bounded check size assumption; bounded check size model; fault detection; design problem; NP-hard.
D. Gu, D.J. Rosenkrantz, S.S. Ravi, "Construction of Check Sets for Algorithm-Based Fault Tolerance", IEEE Transactions on Computers, vol. 43, no. , pp. 641-650, June 1994, doi:10.1109/12.286298
99 ms
(Ver )