This Article 
 Bibliographic References 
 Add to: 
Distributed Reset
September 1994 (vol. 43 no. 9)
pp. 1026-1038

A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense: if the coordination between the up-processes in the system is ever lost (due to failures or repairs of processes and channels), then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress.

[1] Y. Afek, B. Awerbuch, and E. Gafni, "Applying static network protocols to dynamic networks," inProc. 28th IEEE Symp. Foundations of Comput. Sci., 1987.
[2] A. Arora, "A foundation of fault-tolerant computing," Ph.D. dissertation, The University of Texas, Austin, 1992.
[3] A. Arora, P. Attie, M. Evangelist and M.G. Gouda, "Convergence of iteration systems,"Distributed Computing, vol. 7, pp. 48-53, 1993.
[4] A. Arora and M. G. Gouda, "Distributed reset (extended abstract)," inProc. 10th Conf. on Foundations of Software Technol. and Theoretical Comput. Sci., LNCS 472, 1990, pp. 316-331, (Springer-Verlag).
[5] J. E. Burns, M. G. Gouda, and R. E. Miller, "On relaxing interleaving assumptions," Tech. Rep. GIT-ICS-88/29, School of ICS, Georgia Inst. of Technol., 1988.
[6] G. M. Brown, M. G. Gouda, and C.-L. Wu, "Token systems that self-stabilize,"IEEE Trans. Comput., vol. 38, no. 6, pp. 845-852, June 1989.
[7] J. Burns and J. Pachl, "Uniform self-stabilizing rings,"ACM Trans. Programming Languages Syst., vol. 11, no. 2, pp. 330-344, 1989.
[8] K. M. Chandy and L. Lamport, "Distributed snapshots: Determining global states of distributed systems,"ACM Trans. Comput. Syst., vol. 3, no. 1, pp. 63-75, Feb. 1985.
[9] S. Dolev, A. Israeli, and S. Moran, "Self-stabilization of dynamic systems assuming only read/write atomicity," inProc. Ninth ACM Symp. Principles of Distrib. Computing, 1990, pp. 103-117.
[10] E. W. Dijkstra and C. S. Scholten, "Termination detection for diffusing computations,"Inform. Processing Lett., vol. 11, no. 1, pp. 1-4, 1980.
[11] M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of distributed consensus with one faulty process,"J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[12] N. Frances,Fairness. New York: Springer-Verlag, 1986.
[13] M. G. Gouda and N. Multari, "Stabilizing communication protocols,"IEEE Trans. Comput., vol. 40, no. 4, pp. 448-458, Apr. 1991.
[14] S. Katz and K. Perry, "Self-stabilizing extensions for message-passing systems," inProc. 9th Ann. Symp. Principles of Distributed Computing, 1990, pp. 91-101.
[15] L. Lamport and L. Lynch, "Distributed computing: Models and methods,"Handbook of Theoretical Computer Science. New York: Elsevier Science, ch. 18, vol. 2, 1990, pp. 1158-1199.
[16] R. Perlman, "An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN,"Proc. Ninth Data Comm. Symp., Vancouver, Canada, 1985, pp. 44- 53.
[17] W. D. Tajibnapis, "A correctness proof of a topology information maintenance protocol for distributed computer networks,"Commun. ACM, vol. 20, pp. 477-485, 1977.

Index Terms:
distributed processing; fault tolerant computing; system recovery; distributed reset subsystem; embedded system; layered design; leader election; spanning tree construction; diffusing computation; self-stabilizing components; up-process coordination; process failures; process repairs; robustness; fail-stop failure tolerance; channel failures; channel repairs; reliability; fault tolerance.
A. Arora, M. Gouda, "Distributed Reset," IEEE Transactions on Computers, vol. 43, no. 9, pp. 1026-1038, Sept. 1994, doi:10.1109/12.312126
Usage of this product signifies your acceptance of the Terms of Use.