This Article 
 Bibliographic References 
 Add to: 
Distributed Recovery in Fault-Tolerant Multiprocessor Networks
October 1986 (vol. 35 no. 10)
pp. 871-879
A methodology for characterizing dynamic distributed recovery in fault-tolerant multiprocessor systems is developed using graph theory. Distributed recovery, which is intended for systems with no central supervisor, depends on the cooperation of a set of processors to execute the recovery function, since each processor is assumed to have only a limited amount of information about the system as a whole. Facility graphs, whose nodes denote the system components (processors), and whose edges denote interconnection between components, are used to represent multiprocessor systems, and error conditions. A general distributed recovery strategy R, which allows global recovery to be achieved via a sequence of local actions, is given. R recovers the system in several steps in which different nodes successively act as the local supervisor. R is specialized for two important classes of systems: loop networks and tree networks. For each of these cases, fault-tolerant designs and their associated distributed recovery strategies, which allow recovery from up to k faults within a specified number of steps, are presented.
Index Terms:
tree networks, Distributed recovery, fault tolerance, fault- tolerant multiprocessor systems, graph theory, loop networks, reconfiguration
R.M. Yyanney, J.P. Hayes, "Distributed Recovery in Fault-Tolerant Multiprocessor Networks," IEEE Transactions on Computers, vol. 35, no. 10, pp. 871-879, Oct. 1986, doi:10.1109/TC.1986.1676678
Usage of this product signifies your acceptance of the Terms of Use.