
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Douglas M. Blough, Hongying W. Brown, "The Broadcast Comparison Model for OnLine Fault Diagnosis in Multicomputer Systems: Theory and Implementation," IEEE Transactions on Computers, vol. 48, no. 5, pp. 470493, May, 1999.  
BibTex  x  
@article{ 10.1109/12.769431, author = {Douglas M. Blough and Hongying W. Brown}, title = {The Broadcast Comparison Model for OnLine Fault Diagnosis in Multicomputer Systems: Theory and Implementation}, journal ={IEEE Transactions on Computers}, volume = {48}, number = {5}, issn = {00189340}, year = {1999}, pages = {470493}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.769431}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  The Broadcast Comparison Model for OnLine Fault Diagnosis in Multicomputer Systems: Theory and Implementation IS  5 SN  00189340 SP470 EP493 EPD  470493 A1  Douglas M. Blough, A1  Hongying W. Brown, PY  1999 KW  Diagnosability KW  distributed algorithms KW  fault diagnosis KW  multicomputer systems. VL  48 JA  IEEE Transactions on Computers ER   
Abstract—This paper describes a new comparisonbased model for distributed fault diagnosis in multicomputer systems with a weak reliable broadcast capability. The classical problems of diagnosability and diagnosis are both considered under this broadcast comparison model. A characterization of diagnosable systems is given, which leads to a polynomialtime diagnosability algorithm. A polynomialtime diagnosis algorithm for
[1] F.J. Allan, T. Kameda, and S. Toida, “An Approach to the Diagnosability Analysis of a System,” IEEE Trans. Computers, vol. 23, no. 10, pp. 1,0401,042, Oct. 1975.
[2] A. Bagchi and S.L. Hakimi, "An Optimal Algorithm for Distributed System Level Diagnosis," Proc. IEEE CS 21st Int'l Symp. FaultTolerant Computing, pp. 214221, 1991.
[3] F. Bao and Y. Igarashi, “Reliable Broadcasting in Product Networks with Byzantine Faults,” Digest 26th Int'l Symp. FaultTolerant Computing, pp. 262271, 1996.
[4] F. Barsi, F. Grandoni, and P. Maestrini, “A Theory of Diagnosability of Digital Systems,” IEEE Trans. Computers, vol. 25, no. 6, pp. 585593, June 1976.
[5] R. Bianchini and R. Buskens, “Implementation of OnLine Distributed SystemLevel Diagnosis Theory,” IEEE Trans. Computers, vol. 41, no. 5, pp. 616626, May 1992.
[6] R. Bianchini, K. Goodwin, and D. Nydick, “Practical Application and Implementation of Distributed SystemLevel Diagnosis Theory,” Digest 20th Int'l Symp. FaultTolerant Computing, pp.332339, 1990.
[7] K.Y. Chwa and S.L. Hakimi, “Schemes for FaultTolerant Computing: A Comparison of Modularly Redundant and$t$Diagnosable Systems,” Information and Control, vol. 49, pp. 212238, 1981.
[8] F. Cristian, H. Aghili, and R. Strong, “Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement,” Digest 15th Int'l Symp. FaultTolerant Computing, pp. 200206, 1985.
[9] D. Cummings and L. Alkalaj, “Checkpoint/Rollback in a Distributed System using CoarseGrained Dataflow,” Digest 24th Int'l Symp. FaultTolerant Computing, pp. 424433, 1994.
[10] A.T. Dahbura, “SystemLevel Diagnosis: A Perspective for The Third Decade,” Concurrent Computation: Algorithms, Architectures, Technologies. Plenum 1988.
[11] A.T. Dahbura and G.M. Masson, “An$O(n^{2. 5})$Fault Identification Algorithm for Diagnosable Systems,” IEEE Trans. Computers, vol.33, no. 6, pp. 486492, June 1984.
[12] D. Dolev, “The Byzantine Generals Strike Again,” J. Algorithms, vol. 3, pp. 1430, 1982.
[13] V. Hadzilacos and S. Toueg, "FaultTolerant Broadcasts and Related Problems," in Distributed Systems, S. Mullender, ed., ACM Press, New York, 1993, pp. 97138.
[14] M. Hiltunen, “Membership and System Diagnosis,” Proc. 14th Symp. Reliable Distributed Systems, pp. 208217, 1995.
[15] S. Hosseini, J. Kuhl, and S. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair,” IEEE Trans. Computers, vol. 33, no. 3, pp. 223233, Mar. 1984.
[16] J.G. Kuhl and S.M. Reddy, "Distributed Fault Tolerance for Large Multiprocessor Systems," Proc. 1980 Computer ArchitectureSymp., pp. 222229, May 1980.
[17] B.F. Lewis et al. “COSMOS Multicomputer Operating System and Development Environment Functional Specification,” NASA Technical Memorandum, Caltech, JPL, Aug. 1992.
[18] B.F. Lewis and R.L. Bunker, “MAX: An Advanced Parallel Computer for Space Applications,” Proc. Second Int'l Symp. Space Information Systems, Sept. 1990.
[19] J. Maeng and M. Malek, “A Comparison Connection Assignment for SelfDiagnosis of Multiprocessor Systems,” Digest 11th Int'l Symp. Fault Tolerant Computing, pp. 173175, 1981.
[20] M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,” Proc. Seventh Int'l Symp. Computer Architecture, pp. 3135, 1980.
[21] G.G.L. Meyer, “A Diagnosis Algorithm for the BGM System Level Fault Model,” IEEE Trans. Computers vol. 33, no. 8, pp. 756758, Aug. 1984.
[22] J. Narasimhan and K. Nakajima, “An Algorithm for Determining the Fault Diagnosability of a System,” IEEE Trans. Computers, vol. 35, no. 11, pp. 1,0041,008, Nov. 1986.
[23] M. Pfluegl and D. Blough, “Communication Protocols for FaultTolerant Clock Synchronization in NotCompletelyConnected Networks,” Proc. 11th Symp. Reliable Distributed Systems, pp. 130137, 1992.
[24] D. Powell, G. Bonn, D. Seaton, P. Verissimo, and F. Waeselynck, The Delta4 Approach to Dependability in Open Distributed Computing Systems Proc. 18th IEEE Int'l Symp. FaultTolerant Computing (FTCS18), pp. 246251, June 1988.
[25] F.P. Preparata, G. Metze, and R.T. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Electronic Computers, vol. 16, pp. 848854, Dec. 1967.
[26] R. D. Schlichting and F. B. Schneider,“Failstop processors: An approach to designing faulttolerant computing systems,”ACM Trans. Comput. Syst., vol. 1, no. 3, pp. 222–238, Aug. 1983.
[27] A. Sengupta and A. Dahbura, “On SelfDiagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach,” IEEE Trans. Computers, vol. 41, no. 11, pp. 13861396, Nov. 1992.
[28] G.F. Sullivan, "A Polynomial Time Algorithm for Fault Diagnosability," Proc. 25th Symp. Foundations of Computer Science, pp. 148156, Oct. 1984.
[29] C. Walter, N. Suri, and M. Hugue, “Continual OnLine Diagnosis of Hybrid Faults,” Proc. Fourth IFIP Working Conf. Dependable Computing for Critical Applications, pp. 233249, 1994.
[30] H. Wang, “Practical ComparisonBased Fault Diagnosis in Multiprocessor Systems,” PhD dissertation, Dept. of Electrical and Computer Eng., Univ. of California, Irvine, June 1995.
[31] H. Wang, D. Blough, and L. Alkalaj, “Analysis and Experimental Evaluation of ComparisonBased SystemLevel Diagnosis for Multiprocessor Systems,” Digest 24th IEEE Int'l Symp. FaultTolerant Computing, pp. 5564, 1994.