This Article 
 Bibliographic References 
 Add to: 
On Probabilistic Diagnosis of Multiprocessor Systems Using Multiple Syndromes
June 1994 (vol. 5 no. 6)
pp. 630-638

This paper addresses the distributed self-diagnosis of a multiprocessor/multicomputersystem based on fault syndromes formed by comparison testing. The authors show thatby using multiple fault syndromes, it is possible to achieve significantly better diagnosisthan by using a single fault syndrome, even when the amount of time devoted to testingis the same. They derive a multiple syndrome diagnosis algorithm that in terms of thelevel of diagnostic accuracy achieved, is globally suboptimal, but optimal among alldiagnosis algorithms of a certain type to be defined. The diagnosis algorithm producesgood results, even with sparse interconnection networks and interprocessor tests withlow fault coverage. It is also proven that the diagnosis algorithm produces 100% correctdiagnosis as N, the number of nodes in the system, approaches /spl infin/, provided thatthe interconnection network has connectivity greater than or equal to 2 and that thenumber of syndromes produced grows faster than log N. This solution and anothermultiple syndrome diagnosis solution by Fussell and Rangarajan (1989) are comparatively evaluated, both analytically and with simulations.

[1] D.M. Blough, G.F. Sullivan, and G.M. Masson, "Almost certain diagnosis for intermittently faulty systems," inProc. 18th Int. Symp. Fault-Tolerant Comput., 1988, pp. 260-271.
[2] D.M. Blough, "Fault detection and diagnosis in multiprocessor systems," Ph.D. dissertation, The Johns Hopkins Univ., Baltimore, MD, 1988.
[3] M. L. Blount, "Probabilistic treatment of diagnosis in digital systems,"Digest of Papers, FTCS-7, pp. 72-77, 1977.
[4] A. Dahbura, K. K. Sabnani, and L. L. King, "The comparison approach to multiprocessors fault diagnosis,"IEEE Trans. Comput., vol. C-36, pp. 373-378, Mar. 1987.
[5] D. Fussell and S. Rangarajan, "Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity," inProc. 19th Int. Symp. Fault Tolerant Comput., 1989, pp. 560-565.
[6] S. Lee, "Probabilistic multiprocessor and multicomputer diagnosis," Ph.D. dissertation, Univ. Michigan, Ann Arbor, 1990.
[7] S. Mallela and G. M. Masson, "Diagnosable systems for intermittent faults,"IEEE Trans. Comput., vol. C-27, no. 6, pp. 560-566, June 1978.
[8] S. Mallela and G. M. Masson, "Diagnosis without repair for hybrid fault situations,"IEEE Trans. Comput., vol. C-29, no. 6, pp. 461-470, June 1980.
[9] F. P. Preparata, G. Metze, and R. T. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electron. Comput., vol. EC-16, no. 6, pp. 848-854, Dec. 1967.
[10] S. Rangarajan and D. Fussell, "A probabilistic method for fault diagnosis of multiprocessor systems," inProc. 18th Int. Symp. Fault-Tolerant Comput., 1988, pp. 278-283.
[11] C. L. Yang and G. M. Masson, "A fault identification algorithm forti,- diagnosable systems,"IEEE Trans. Comput., vol. C-35, pp. 503-510, June 1986.

Index Terms:
Index Termsmultiprocessing systems; probability; fault tolerant computing; performance evaluation;probabilistic diagnosis; multiprocessor systems; multiple syndromes; distributedself-diagnosis; comparison testing; diagnostic accuracy; diagnosis algorithms; sparseinterconnection networks; interprocessor tests; low fault coverage; fault-tolerantcomputing; intermittent fault; multicomputer; multiprocessor; self-test; system-leveldiagnosis
S. Lee, K.G. Shin, "On Probabilistic Diagnosis of Multiprocessor Systems Using Multiple Syndromes," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 6, pp. 630-638, June 1994, doi:10.1109/71.285608
Usage of this product signifies your acceptance of the Terms of Use.