This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Diagnosis and Repair in Multiprocessor Systems
February 1993 (vol. 42 no. 2)
pp. 205-217

Diagnosis of multiprocessor systems in which faulty processors can be replaced by spares or repaired is known as sequential diagnosis. A generalization is considered of classical sequential diagnosis, referred to as diagnosis and repair, under a probabilistic model for the faults and test outcomes in a system. It is shown that correct diagnosis and repair of all faulty processors can be achieved with high probability in a large class of systems including, for example, rings, grids, meshes, tori, and hypercubes. These results show, without restrictive assumptions on the behavior of faulty processors, that correct diagnosis can be achieved in these widely used, low-degree systems when a fixed percentage of the processors in the system are faulty.

[1] P. Berman, A. Pelc, "Distributed Probabilistic Fault Diagnosis for Multiprocessor Systems,"20th International Fault-Tolerant Computing Symposium, pp. 340-347, July 1990.
[2] D.M. Blough, "Fault detection and diagnosis in multiprocessor systems," Ph.D. dissertation, The Johns Hopkins Univ., Baltimore, MD, 1988.
[3] D. Blough and A. Pelc, "Diagnosis and repair in multiprocessor systems," Tech. Rep. ECE-90-2, Dep. Elec. and Comput. Eng., U.C., Irvine, Jan. 1990.
[4] D. Blough and A. Pelc, "Reliable diagnosis and repair in constant-degree multiprocessor systems," in20th Int. Symp. Fault-Tolerant Comput., Dig. Papers, 1990, pp. 316-323.
[5] D. Blough, G. Sullivan, and G. Masson, "Intermittent fault diagnosis in multiprocessor systems,"IEEE Trans. Comput., vol. 41, no. 11, pp. 1430-1441, Nov. 1992.
[6] D. Blough, G. Sullivan, and G. Masson, "Efficient diagnosis of multiprocessor systems under probabilistic models,"IEEE Trans. Comput., vol. 42, no. 9, pp. 1126-1136, Sept. 1992.
[7] M. Blount, "Probabilistic treatment of diagnosis in digital systems," inDig. 7th Int. Symp. Fault-Tolerant Comput., 1977, pp. 72-77.
[8] H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations,"Ann. Mathemat. Statist., vol. 23, pp. 493-507, 1952.
[9] A. Dahbura, K. K. Sabnani, and L. L. King, "The comparison approach to multiprocessors fault diagnosis,"IEEE Trans. Comput., vol. C-36, pp. 373-378, Mar. 1987.
[10] P. Erdös and J. Spencer,Probabilistic Methods in Combinatorics. New York Academic, 1974.
[11] D. Fussell and S. Rangarajan, "Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity," inProc. 19th Int. Symp. Fault Tolerant Comput., 1989, pp. 560-565.
[12] S. Maheshwari and S.L. Hakimi, "On models for diagnosable systems and probabilistic fault diagnosis,"IEEE Trans. Comput., vol. C-25, pp. 228-236, Mar. 1976.
[13] A. Pelc, "Undirected graph models for system-level fault diagnosis,"IEEE Trans. Comput., vol. 40, pp. 1271-1276, Nov. 1991.
[14] F. Preparata, G. Metze, and R. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electron. Comput., vol. EC-16, pp. 848-854, Dec. 1967.
[15] S. Rangarajan and D. Fussell, "A probabilistic method for fault diagnosis of multiprocessor systems," inProc. 18th Int. Symp. Fault-Tolerant Comput., 1988, pp. 278-283.
[16] E. Scheinerman, "Almost sure fault tolerance in random graphs,"SIAM J. Comput., vol. 16, pp. 1124-1134, Dec. 1987.
[17] A. K. Somani, V. K. Agrawal, and D. Avis, "A generalized theory for system level diagnosis,"IEEE Trans. Comput., vol. C-36, pp. 538-546, May 1987.
[18] A. K. Somani and V. K. Agarwal, "Distributed diagnosis algorithms," inProc. FTCS-1989, June 1989, pp. 70-77.

Index Terms:
multiprocessor systems; faulty processors; sequential diagnosis; diagnosis and repair; probabilistic model; rings; grids; meshes; tori; hypercubes; fault tolerant computing; multiprocessing systems; probability.
Citation:
D.M. Blough, A. Pelc, "Diagnosis and Repair in Multiprocessor Systems," IEEE Transactions on Computers, vol. 42, no. 2, pp. 205-217, Feb. 1993, doi:10.1109/12.204793
Usage of this product signifies your acceptance of the Terms of Use.