The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—We consider the problem of fault diagnosis in multiprocessor systems. Processors perform tests on one another; fault-free testers correctly identify the fault status of tested processors, while faulty testers can give arbitrary test results. Processors fail with arbitrary probabilities and all failures are independent. The goal is to identify correctly the status of all processors, based on the set of test results. A diagnosis algorithm is <it>optimal</it> if it has the highest probability of correctness (<it>reliability</it>) among all (deterministic) diagnosis algorithms. We give a fast diagnosis algorithm and prove its optimality for arbitrary values of failure probabilities. This is the first time that optimal diagnosis is given for systems without any assumptions on the behavior of faulty processors or on the values of failure probabilities.</p><p>We also investigate <it>locally optimal</it> diagnosis algorithms: For any set of test results, they return the most probable configuration of faulty and fault-free processors that could yield it. We show a fast diagnosis which is always locally optimal. If all processors have failure probabilities smaller than <tmath>${\textstyle{1 \over 2}},$</tmath> a locally optimal diagnosis is proved to be optimal. However, if some processors have failure probabilities exceeding <tmath>${\textstyle{1 \over 2}},$</tmath> a locally optimal diagnosis need not have the highest reliability. We even show examples that it may have arbitrarily small reliability when the number of processors increases, while optimal reliability remains constant.</p>
Fault diagnosis, fault tolerance, random fault, test.
Andrzej Pelc, "Optimal Diagnosis of Heterogeneous Systems with Random Faults", IEEE Transactions on Computers, vol. 47, no. , pp. 298-304, March 1998, doi:10.1109/12.660165
197 ms
(Ver 3.3 (11022016))