
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
D.M. Blough, G.F. Sullivan, G.M. Masson, "Efficient Diagnosis of Multiprocessor Systems Under Probabilistic Models," IEEE Transactions on Computers, vol. 41, no. 9, pp. 11261136, September, 1992.  
BibTex  x  
@article{ 10.1109/12.165394, author = {D.M. Blough and G.F. Sullivan and G.M. Masson}, title = {Efficient Diagnosis of Multiprocessor Systems Under Probabilistic Models}, journal ={IEEE Transactions on Computers}, volume = {41}, number = {9}, issn = {00189340}, year = {1992}, pages = {11261136}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.165394}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  Efficient Diagnosis of Multiprocessor Systems Under Probabilistic Models IS  9 SN  00189340 SP1126 EP1136 EPD  11261136 A1  D.M. Blough, A1  G.F. Sullivan, A1  G.M. Masson, PY  1992 KW  multiprocessor systems; fault diagnosis; probabilistic fault model; diagnosis algorithm; lower bound; upper bounds; hypercubes; computational complexity; fault tolerant computing; hypercube networks; multiprocessing systems; parallel algorithms; probability. VL  41 JA  IEEE Transactions on Computers ER   
The problem of fault diagnosis in multiprocessor systems is considered under a probabilistic fault model. The focus is on minimizing the number of tests that must be conducted to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose these states with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is proved. Lower and upper bounds on the number of tests required for regular systems are presented. A class of regular systems which includes hypercubes is shown to be correctly diagnosable with high probability. In all cases, the number of tests required under this probabilistic model is shown to be significantly less than under a boundedsize fault set model. These results represent a very great improvement in the performance of systemlevel diagnosis techniques.
[1] D. Angluin and L. Valiant, "Fast probabilistic algorithms for Hamiltonian circuits and matchings,"J. Comput. Syst. Sci., vol. 18, pp. 155193, Apr. 1979.
[2] D.M. Blough, "Fault detection and diagnosis in multiprocessor systems," Ph.D. dissertation, The Johns Hopkins Univ., Baltimore, MD, 1988.
[3] D.M. Blough, G.F. Sullivan, and G.M. Masson, "Almost certain diagnosis for intermittently faulty systems," inProc. 18th Int. Symp. FaultTolerant Comput., 1988, pp. 260271.
[4] M. Blount, "Probabilistic treatment of diagnosis in digital systems," inDig. 7th Int. Symp. FaultTolerant Comput., IEEE Computer Society Press, 1977, pp. 7277.
[5] H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations,"Ann. Math. Stat., vol. 23, pp. 493507, 1952.
[6] A. T. Dahbura, "An efficient algorithm for identifying the most likely fault set in a probabilistically diagnosable system,"IEEE Trans. Comput., vol. C35, pp. 354356, Apr. 1986.
[7] A. Dahbura, K. K. Sabnani, and L. L. King, "The comparison approach to multiprocessors fault diagnosis,"IEEE Trans. Comput., vol. C36, pp. 373378, Mar. 1987.
[8] P. Erdös and P. Spencer,Probabilistic Methods in Combinatorics. New York: Academic, 1974.
[9] W. Feller,An Introduction to Probability Theory and its Applications, New York: Wiley, 1968.
[10] S. L. Hakimi and A. T. Amin, "Characterization of the connection assignment of diagnosable systems,"IEEE Trans. Comput., vol. C23, pp. 8688, Jan. 1974.
[11] W. D. Hillis,The Connection Machine. Cambridge, MA: MIT Press, 1985.
[12] J. G. Kuhl and S. M. Reddy, "Distributed faulttolerance for large multiprocessor system," inProc. 1980 Comput. Architecture Conf., France, May 1980.
[13] S. N. Maheshwari and S. L. Hakimi, "On models for diagnosable systems and probabilistic fault diagnosis,"IEEE Trans. Comput., vol. C25, pp. 228236, Mar. 1976.
[14] S. Mallela and G. M. Masson, "Diagnosis without repair for hybrid fault situations,"IEEE Trans. Comput., vol. C29, pp. 461470, June 1980.
[15] A. Pelc, "Undirected graph models for systemlevel fault diagnosis," Département d'Informatique Research Report #10, Universitédu QuébecàHull, 1988.
[16] F. Preparata, G. Metze, and R. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electron. Comput., vol. EC16, pp. 848854, Dec. 1967.
[17] S. Rangarajan and D. Fussell, "A probabilistic method for fault diagnosis of multiprocessor systems," inProc. 18th Int. Symp. FaultTolerant Comput., 1988, pp. 278283.
[18] E. Scheinerman, "Almost sure fault tolerance in random graphs,"SIAM J. Comput., vol. 16, pp. 11241134, Dec. 1987.
[19] A. K. Somani, V. K. Agrawal, and D. Avis, "A generalized theory for system level diagnosis,"IEEE Trans. Comput., vol. C36, pp. 538546, May 1987.
[20] G. F. Sullivan, "Systemlevel fault diagnosability in probabilistic and weighted models," inDig. 17th Int. Symp. FaultTolerant Comput., IEEE Computer Society Press, 1987, pp. 190195.
[21] C. L. Yang and G. M. Masson, "A fault identification algorithm forti, diagnosable systems,"IEEE Trans. Comput., vol. C35, pp. 503510, June 1986.