|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Douglas M. Blough, Hongying W. Brown, "The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation," IEEE Transactions on Computers, vol. 48, no. 5, pp. 470-493, May, 1999. | |||
| BibTex | x | ||
| @article{ 10.1109/12.769431, author = {Douglas M. Blough and Hongying W. Brown}, title = {The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation}, journal ={IEEE Transactions on Computers}, volume = {48}, number = {5}, issn = {0018-9340}, year = {1999}, pages = {470-493}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.769431}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation IS - 5 SN - 0018-9340 SP470 EP493 EPD - 470-493 A1 - Douglas M. Blough, A1 - Hongying W. Brown, PY - 1999 KW - Diagnosability KW - distributed algorithms KW - fault diagnosis KW - multicomputer systems. VL - 48 JA - IEEE Transactions on Computers ER - | |||
Abstract—This paper describes a new comparison-based model for distributed fault diagnosis in multicomputer systems with a weak reliable broadcast capability. The classical problems of diagnosability and diagnosis are both considered under this broadcast comparison model. A characterization of diagnosable systems is given, which leads to a polynomial-time diagnosability algorithm. A polynomial-time diagnosis algorithm for
[1] F.J. Allan, T. Kameda, and S. Toida, “An Approach to the Diagnosability Analysis of a System,” IEEE Trans. Computers, vol. 23, no. 10, pp. 1,040-1,042, Oct. 1975.
[2] A. Bagchi and S.L. Hakimi, "An Optimal Algorithm for Distributed System Level Diagnosis," Proc. IEEE CS 21st Int'l Symp. Fault-Tolerant Computing, pp. 214-221, 1991.
[3] F. Bao and Y. Igarashi, “Reliable Broadcasting in Product Networks with Byzantine Faults,” Digest 26th Int'l Symp. Fault-Tolerant Computing, pp. 262-271, 1996.
[4] F. Barsi, F. Grandoni, and P. Maestrini, “A Theory of Diagnosability of Digital Systems,” IEEE Trans. Computers, vol. 25, no. 6, pp. 585-593, June 1976.
[5] R. Bianchini and R. Buskens, “Implementation of On-Line Distributed System-Level Diagnosis Theory,” IEEE Trans. Computers, vol. 41, no. 5, pp. 616-626, May 1992.
[6] R. Bianchini, K. Goodwin, and D. Nydick, “Practical Application and Implementation of Distributed System-Level Diagnosis Theory,” Digest 20th Int'l Symp. Fault-Tolerant Computing, pp.332-339, 1990.
[7] K.Y. Chwa and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and$t$-Diagnosable Systems,” Information and Control, vol. 49, pp. 212-238, 1981.
[8] F. Cristian, H. Aghili, and R. Strong, “Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement,” Digest 15th Int'l Symp. Fault-Tolerant Computing, pp. 200-206, 1985.
[9] D. Cummings and L. Alkalaj, “Checkpoint/Rollback in a Distributed System using Coarse-Grained Dataflow,” Digest 24th Int'l Symp. Fault-Tolerant Computing, pp. 424-433, 1994.
[10] A.T. Dahbura, “System-Level Diagnosis: A Perspective for The Third Decade,” Concurrent Computation: Algorithms, Architectures, Technologies. Plenum 1988.
[11] A.T. Dahbura and G.M. Masson, “An$O(n^{2. 5})$Fault Identification Algorithm for Diagnosable Systems,” IEEE Trans. Computers, vol.33, no. 6, pp. 486-492, June 1984.
[12] D. Dolev, “The Byzantine Generals Strike Again,” J. Algorithms, vol. 3, pp. 14-30, 1982.
[13] V. Hadzilacos and S. Toueg, "Fault-Tolerant Broadcasts and Related Problems," in Distributed Systems, S. Mullender, ed., ACM Press, New York, 1993, pp. 97-138.
[14] M. Hiltunen, “Membership and System Diagnosis,” Proc. 14th Symp. Reliable Distributed Systems, pp. 208-217, 1995.
[15] S. Hosseini, J. Kuhl, and S. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair,” IEEE Trans. Computers, vol. 33, no. 3, pp. 223-233, Mar. 1984.
[16] J.G. Kuhl and S.M. Reddy, "Distributed Fault Tolerance for Large Multiprocessor Systems," Proc. 1980 Computer ArchitectureSymp., pp. 222-229, May 1980.
[17] B.F. Lewis et al. “COSMOS Multicomputer Operating System and Development Environment Functional Specification,” NASA Technical Memorandum, Caltech, JPL, Aug. 1992.
[18] B.F. Lewis and R.L. Bunker, “MAX: An Advanced Parallel Computer for Space Applications,” Proc. Second Int'l Symp. Space Information Systems, Sept. 1990.
[19] J. Maeng and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,” Digest 11th Int'l Symp. Fault Tolerant Computing, pp. 173-175, 1981.
[20] M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,” Proc. Seventh Int'l Symp. Computer Architecture, pp. 31-35, 1980.
[21] G.G.L. Meyer, “A Diagnosis Algorithm for the BGM System Level Fault Model,” IEEE Trans. Computers vol. 33, no. 8, pp. 756-758, Aug. 1984.
[22] J. Narasimhan and K. Nakajima, “An Algorithm for Determining the Fault Diagnosability of a System,” IEEE Trans. Computers, vol. 35, no. 11, pp. 1,004-1,008, Nov. 1986.
[23] M. Pfluegl and D. Blough, “Communication Protocols for Fault-Tolerant Clock Synchronization in Not-Completely-Connected Networks,” Proc. 11th Symp. Reliable Distributed Systems, pp. 130-137, 1992.
[24] D. Powell, G. Bonn, D. Seaton, P. Verissimo, and F. Waeselynck, The Delta-4 Approach to Dependability in Open Distributed Computing Systems Proc. 18th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS-18), pp. 246-251, June 1988.
[25] F.P. Preparata, G. Metze, and R.T. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Electronic Computers, vol. 16, pp. 848-854, Dec. 1967.
[26] R. D. Schlichting and F. B. Schneider,“Fail-stop processors: An approach to designing fault-tolerant computing systems,”ACM Trans. Comput. Syst., vol. 1, no. 3, pp. 222–238, Aug. 1983.
[27] A. Sengupta and A. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach,” IEEE Trans. Computers, vol. 41, no. 11, pp. 1386-1396, Nov. 1992.
[28] G.F. Sullivan, "A Polynomial Time Algorithm for Fault Diagnosability," Proc. 25th Symp. Foundations of Computer Science, pp. 148-156, Oct. 1984.
[29] C. Walter, N. Suri, and M. Hugue, “Continual On-Line Diagnosis of Hybrid Faults,” Proc. Fourth IFIP Working Conf. Dependable Computing for Critical Applications, pp. 233-249, 1994.
[30] H. Wang, “Practical Comparison-Based Fault Diagnosis in Multiprocessor Systems,” PhD dissertation, Dept. of Electrical and Computer Eng., Univ. of California, Irvine, June 1995.
[31] H. Wang, D. Blough, and L. Alkalaj, “Analysis and Experimental Evaluation of Comparison-Based System-Level Diagnosis for Multiprocessor Systems,” Digest 24th IEEE Int'l Symp. Fault-Tolerant Computing, pp. 55-64, 1994.

