Probabilistic Evaluation of Online Checks in Fault-Tolerant Multiprocessor Systems
May 1992 (vol. 41 no. 5)
pp. 532-541

The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of online checks. in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.

Index Terms:
probabilistic evaluation; fault detection; fault location; fault-tolerant multiprocessor systems; concurrent error detection; online checks; matrix-based model; probabilistic matrices; fault tolerant computing; multiprocessing systems; probability.
V.S.S. Nair, Y.V. Hoskote, J.A. Abraham, "Probabilistic Evaluation of Online Checks in Fault-Tolerant Multiprocessor Systems," IEEE Transactions on Computers, vol. 41, no. 5, pp. 532-541, May 1992, doi:10.1109/12.142679
