Issue No.09 - September (1994 vol.43)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.312126
<p>A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense: if the coordination between the up-processes in the system is ever lost (due to failures or repairs of processes and channels), then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress.</p>
distributed processing; fault tolerant computing; system recovery; distributed reset subsystem; embedded system; layered design; leader election; spanning tree construction; diffusing computation; self-stabilizing components; up-process coordination; process failures; process repairs; robustness; fail-stop failure tolerance; channel failures; channel repairs; reliability; fault tolerance.
A. Arora, "Distributed Reset", IEEE Transactions on Computers, vol.43, no. 9, pp. 1026-1038, September 1994, doi:10.1109/12.312126