This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An (N -1)-Resilient Algorithm for Distributed Termination Detection
January 1995 (vol. 6 no. 1)
pp. 63-78

Abstract—This paper presents a fault-tolerant termination detection algorithm based on a previous fault-sensitive scheme by Dijkstra and Scholten. The proposed algorithm can tolerate any number of crash failures. It runs as efficiently as its nonfault-tolerant predecessor if no process actually fails during the computation, and otherwise incurs only a small amount of cost for each actual failure. It is assumed that the underlying communication network provides such services are reliable end-to-end communication, failure detection, and fail flush.

Index Terms—Distributed algorithm, fault tolerance, message complexity, termination detection.

[1] M. Ahuja,“An implementation of$F$-channels,”IEEE Trans. Parallel Distrib. Syst.,pp. 658–667, June 1993.
[2] A. Bagchi and S. L. Hakimi,“An optimal algorithm for distributed system level diagnosis,”inIEEE Symp. Fault-Tolerant Comput., June 1991, pp. 214–221.
[3] ——,“Implementation of on-line distributed system level diagnosis theory,”IEEE Trans. Comput., pp. 616–626, May 1992.
[4] L. Bouge and N. Francez,“A compositional approach to superimposition,”inACM Symp. Principles of Programming Languages, Jan. 1988, pp. 240–249.
[5] T.D. Chandra and S. Toueg, "Unreliable Failure Detectors for Asynchronous Systems," Proc. 10th ACM Symp. Principles of Distributed Computing, pp. 325-340, Aug. 1991.
[6] S. Chandrasekaran and S. Venkatesan, "A Message-Optimal Algorithm for Distributed Termination Detection," J. Parallel and Distributed Computing, vol. 8, pp. 245-252, 1990.
[7] K.M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems," ACM Trans. Computer Systems, Feb. 1985.
[8] S. Cohen and D. Lehmann,“Dynamic systems and their distributed termination,”inACM Symp. Principles of Distrib. Comput., Aug. 1982, pp. 29–33.
[9] E. W. Dijkstra, W. H. Feijen, and A. J. van Gasteren,“Derivation of a termination detection algorithm for distributed computations,”Inform. Process. Lett., vol. 16, no. 5, pp. 217–219, June 1983.
[10] E. W. Dijkstra and C.S. Scholten,“Termination detection for diffusing computations,”Inform. Process. Lett., vol. 11, pp. 1–4, Aug. 1980.
[11] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374i–382, 1985.
[12] N. Francez and M. Rodeh,“Achieving distributed termination without freezing,”IEEE Trans. Software Eng., vol. SE-8, pp. 287–292, May 1982.
[13] S. -T. Huang,“Termination detection by using distributed snapshots,”Inform. Process. Lett., vol. 32, pp. 113–119, Aug. 1989.
[14] T.-H. Lai, "Termination Detection for Dynamic Distributed Systems with Non-First-In-First-Out Communications," J. Parallel and Distributed Computing, vol. 3, pp. 577-599, 1986.
[15] T. -H. Lai, Y.-C. Tseng. and X. Dong,“A more efficient message-optimal algorithm for distributed termination detection,”inInt. Parallel Process. Symp., Mar. 1992, pp. 646–649.
[16] L. Lamport,“A theorem on atomicity in distributed algorithms,”Distrib. Comput., vol. 4, pp. 59–68, 1990.
[17] E. L. Lozinskii,“A remark on distributed termination,”inIEEE Int. Conf. Distrib. Comput. Syst., 1985, pp. 416–419.
[18] F. Mattern,“Algorithms for distributed termination detection,”Distrib. Comput., vol. 2, pp. 161–175, 1987.
[19] J. Misra,“Detecting termination of distributed computations using markers,”inACM Symp. Principles Distrib. Comput., Aug. 1983, pp. 290–294.
[20] J. Misra and K.M. Chandy, “Termination Detection of Diffusing Computations in Communicating Sequential Processes,” ACM Trans. Programming Language and Systems, vol. 4, no. 1, pp. 37-43, Jan. 1982.
[21] S. Rangarajan and D. Fussell,“Probabilistic diagnosis algorithms tailored to system topology,”inIEEE Symp. Fault-Tolerant Comput., June 1991, pp. 230–237.
[22] N. Shavit and N. Francez,“A new approach to detection of locally indicative stability,”in13th Int. Colloquium Automata, Languages Programming, LNCS vol. 226, pp. 344–358, July 1986.
[23] A. Tanenbaum, Computer Networks. Prentice Hall, 1988.
[24] R. W. Topor,“Termination Detection for distributed computations,”Inform. Process. Lett., vol. 18, no. 20, pp. 33–36, Jan. 1984.
[25] S. Venkatesan,“Reliable protocols for distributed termination detection,”IEEE Trans. Reliability, vol. 38, pp. 103–110, Apr. 1989.
[26] L.-F. Wu, T.-H. Lai, and Y.-C. Tseng,“Consensus and termination detection in the presence of faulty processes,”inInt. Conf. Parallel and Distrib. Syst., Dec. 1992, pp. 267–274.

Citation:
Ten-Hwang Lai, Li-Fen Wu, "An (N -1)-Resilient Algorithm for Distributed Termination Detection," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 1, pp. 63-78, Jan. 1995, doi:10.1109/71.363410
Usage of this product signifies your acceptance of the Terms of Use.