The Community for Technology Leaders
Green Image
<p><it>Abstract—</it>In this paper, a distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault-free processors perform simple periodic tests on one another; when a fault is detected or a newly-repaired processor joins the network, this new information is disseminated <math><tmath>\mbi{in}</tmath></math><math><tmath>\mbi{parallel}</tmath></math> throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault-free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies.</p><p><it>Index Terms—</it>Computer fault diagnosis, computer fault tolerance, computer networks, distributed computing, system-level fault diagnosis, distributed algorithm, fault detection.</p>

E. A. Ziegler, S. Rangarajan and A. T. Dahbura, "A Distributed System-Level Diagnosis Algorithm for Arbitrary Network Topologies," in IEEE Transactions on Computers, vol. 44, no. , pp. 312-334, 1995.
90 ms
(Ver 3.3 (11022016))