This Article 
 Bibliographic References 
 Add to: 
Implementation of Online Distributed System-Level Diagnosis Theory
May 1992 (vol. 41 no. 5)
pp. 616-626

The practical application and implementation of online distributed system-level diagnosis theory is documented. Proven distributed diagnosis algorithms are shown to be impractical in real systems due to high resource requirements. A distributed system-level diagnosis algorithm called Adaptive DSD is shown to minimize network resources and has resulted in a practical implementation. Adaptive DSD assumes a distributed network, in which network nodes can test other nodes and determine them to be faulty or fault-free. Tests are issued from each node adaptively and depend on the fault situation of the network. Test result reports are generated from test results and forwarded between nodes in the network. Adaptive DSD is proven correct in that each fault-free node reaches an accurate independent diagnosis of the fault conditions of the remaining nodes. No restriction is placed on the number of faulty nodes; any fault situation with any number of faulty nodes is diagnosed correctly. An implementation of the Adaptive DSD algorithm is described.

[1] R. P. Bianchini, Jr., K. Goodwin, and D. S. Nydick, "Practical application and implementation of distributed system-level diagnosis theory," inProc. Twentieth Int. Symp. Fault-Tolerant Comput., IEEE, June 1990, pp. 332-339.
[2] R. P. Bianchini, Jr. and R. Buskens, "An adaptive distributed system-level diagnosis algorithm and its implementation," inProc. Twenty-First Int. Symp. Fault-Tolerant Comput., IEEE, June 1991.
[3] The Ethernet: A Local Area Network. 2.0edition, Digital Equipment Corp., Intel Corp., Xerox Corp., 1982. Data Link Layer and Physical Layer Specification.
[4] S. L. Hakimi and A. T. Amin, "Characterization of connection assignment of diagnosable systems,"IEEE Trans. Comput., vol. C-23, Jan. 1974.
[5] S. L. Hakimi and E. F. Schmeichel, "An adaptive algorithm for system level diagnosis,"J. Algorithms, vol. 5, June 1984.
[6] S. H. Hosseini, J. G. Kuhl, and S. M. Reddy, "A diagnosis algorithm for distributed computing systems with dynamic failure and repair,"IEEE Trans. Comput., vol. C-33, pp. 223-233, Mar. 1984.
[7] E. Kreutzer and S. L. Hakimi, "System-level fault diagnosis: A survey,"Euromicro J., vol. 20, no. 4,5, pp. 323-330, May 1987.
[8] J. G. Kuhl and S. M. Reddy, "Distributed fault-tolerance for large multiprocessor system," inProc. 1980 Comput. Architecture Conf., France, May 1980.
[9] J. G. Kuhl and S. M. Reddy, "Fault-diagnosis in fully distributed systems," inProc. 11th Int. Conf. Fault-Tolerant Comput., IEEE, June 1981, pp. 100-105.
[10] J. C. Mogul and J. B. Postel, "Internet standard subnetting procedure," Tech. Rep., NSF-NetRFC 950, Aug. 1985.
[11] J. B. Postel, "Internet protocol," Tech. Rep., NSF-NetRFC 791, Sept. 1981.
[12] F. P. Preparata, G. Metze, and R. T. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electron. Comput., vol. EC-16, pp. 848-854, Dec. 1967.
[13] UNIX Programmer's Manual: Socket, The University of California at Berkeley, 1986.
[14] C.-L. Yang and G. M. Masson, "Hybrid fault diagnosability with unreliable communication links," inProc. Fault-Tolerant Comput. Syst., IEEE, July 1986, pp. 226-231.
[15] S. L. Hakimi and K. Nakajima, "On adaptive system diagnosis,"IEEE Trans. Comput., vol. C-33, pp. 234-240, Mar. 1984.
[16] F. J. Meyer and D. K. Pradhan, "Dynamic testing strategy for distributed systems,"IEEE Trans. Comput., vol. C-38, pp. 356-365, Mar. 1989.

Index Terms:
online distributed system-level diagnosis theory; distributed diagnosis algorithms; real systems; Adaptive DSD; minimize; network resources; distributed network; network nodes; faulty; fault situation; fault-free node; fault conditions; distributed processing; fault tolerant computing; multiprocessor interconnection networks; parallel algorithms.
R.P. Bianchini, Jr., R.W. Buskens, "Implementation of Online Distributed System-Level Diagnosis Theory," IEEE Transactions on Computers, vol. 41, no. 5, pp. 616-626, May 1992, doi:10.1109/12.142688
Usage of this product signifies your acceptance of the Terms of Use.