Issue No.11 - November (1992 vol.41)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.177313
<p>The authors present and analyze a probabilistic model for the self-diagnosis capabilities of a multiprocessor system. In this model an individual processor fails with probability p and a nonfaulty processor testing a faulty processor detects a fault with probability q. This models the situation where processors can be intermittently faulty or the situation where tests are not capable of detecting all possible faults within a processor. An efficient algorithm that can achieve correct diagnosis with high probability in systems of O(n log n) connections, where n is the number of processors, is presented. It is the first algorithm to be able to diagnose a large number of intermittently faulty processors in a class of systems that includes hypercubes. It is shown that, under this model, no algorithm can achieve correct diagnosis with high probability in regular systems which conduct a number of tests dominated by n log n. Examples of systems which perform a modest number of tests are given in which the probability of correct diagnosis for the algorithm is very nearly one.</p>
intermittent fault diagnosis; multiprocessor systems; probabilistic model; self-diagnosis; nonfaulty processor; hypercubes; fault tolerant computing; multiprocessing systems.
D.M. Blough, G.F. Sullivan, G.M. Masson, "Intermittent Fault Diagnosis in Multiprocessor Systems", IEEE Transactions on Computers, vol.41, no. 11, pp. 1430-1441, November 1992, doi:10.1109/12.177313