This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Early Stopping in Global Data Computation
September 2003 (vol. 14 no. 9)
pp. 909-921

Abstract—The Global Data Computation problem consists of providing each process with the same vector (with one entry per process) such that each entry is filled by a value provided by the corresponding process. This paper presents a protocol that solves this problem in an asynchronous distributed system where processes can crash, but equipped with a perfect failure detector. This protocol requires that processes execute asynchronous computation rounds. The number of rounds is upper bounded by min(f + 2, t + 1, n), where n, t, and f represent the total number of processes, the maximum number of processes that can crash, and the number of processes that actually crash, respectively. This value is a lower bound for the number of rounds when t < n - 1. To our knowledge, this protocol is the first to achieve this lower bound. Interestingly, this protocol meets the same lower bound as the one required in synchronous systems.

[1] H. Attiya and J. Welch, Distributed Computing: Fundamentals, Simulations and Advanced Topics. McGraw-Hill, 1998.
[2] R.A. Bazzi and G. Neiger, Simplifying Fault-Tolerance: Providing the Abstraction of Crash Failures J. ACM, vol. 48, no. 3, pp. 499-554, 2001.
[3] R. Brand, Iso-Ethernet: Bridging the Gap from WAN to LAN Data Comm., 1995.
[4] T. Chandra and S. Toueg, Unreliable Failure Detectors for Reliable Distributed Systems J. ACM, vol. 43, no. 2, pp. 225-267, 1996.
[5] B. Charron-Bost, R. Guerraoui, and A. Schiper, Synchronous System and Perfect Failure Detector: Solvability and Efficiency Issues Proc. Int'l IEEE Conf. Dependable Systems and Networks, pp. 523-532, 2000.
[6] C. Delporte-Gallet, H. Fauconnier, J.M. Hélary, and M. Raynal, Early Stopping in Global Data Computation Research Report PI-1436, IRISA, Jan. 2002. ftp://ftp.irisa.fr/techreports/2002PI-1436.ps.gz
[7] D. Dolev, R. Reischuk, and R. Strong, Early Stopping in Byzantine Agreement J. ACM, vol. 37, no. 4, pp. 720-741, 1990.
[8] H. Fauconnier, Fault-Tolerant Round-Based Distributed Computing (in French) Mémoire d'Habilitation, UniversitéParis 7-Denis Diderot, 2001.
[9] M.J. Fischer, N. Lynch, and M.S. Paterson, Impossibility of Distributed Consensus with One Faulty Process J. ACM, vol. 32, no. 2, pp. 374-382, 1985.
[10] E. Gafni, Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony Proc. 17th Symp. Principles of Distributed Computing (PODC '98), pp. 143-152, 1998.
[11] S. Gopal and S. Toueg, Inconsistency and Contamination Proc. 10th ACM Symp. Principles of Distributed Computing (PODC '91), pp. 257-272, 1991.
[12] V. Hadzilacos, Byzantine Agreement under Restricted Types of Failures (Not Telling the Truth Is Different from Telling Lies) Technical Report 18-83, Dept. Computer Science, Harvard Univ., 1983.
[13] V. Hadzilacos and S. Toueg, Reliable Broadcast and Related Problems Distributed Systems, S. Mullender ed., pp. 97-145, New York: ACM Press, 1993.
[14] J.M. Hélary, M. Hurfin, A. Mostefaoui, M. Raynal, and F. Tronel, Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 9, pp. 897-909, Sept. 2000.
[15] L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System Comm. ACM, vol. 7, no. 21, pp. 558-565, 1978.
[16] N. Lynch, Distributed Algorithms. Morgan Kaufmann, 1996.
[17] L. Pease, R. Shostak, and L. Lamport, Reaching Agreement in Presence of Faults J. ACM, vol. 27, no. 2, pp. 228-234, 1980.
[18] G. Neiger and S. Toueg, Automatically Increasing the Fault-Tolerance of Distributed Algorithms J. Algorithms, vol. 11, no. 3, pp. 374-419, 1990.
[19] M. de Prycker, Asynchronous Transfer Mode: Solution for Broadband ISDN. Prentice Hall, 1995.
[20] M.-C. Rosu, Early-Stopping Terminating Reliable Broadcast Protocol for General-Omission Failures Brief Announcement, Proc. 15th ACM Symp. Principles of Distributed Computing (PODC '96) p. 209, 1996
[21] J. Rufino, P. Veríssimo, G. Arroz, C. Almeida, and L. Rodrigues, Fault-Tolerant Broadcasts in CAN Proc. 28th Symp. Fault-Tolerant Computing, pp. 150-159, June 1998.
[22] S. Toueg, K.J. Perry, and T.K. Srikanth, Fast Distributed Agreement SIAM J. Computing vol. 16, no. 3, pp. 445-457, 1987.
[23] P. Veríssimo, Real-Time Communication Distributed Systems, second ed., chapter 17, pp. 447-490, S. Mullender ed., Addison-Wesley/ACM Press, 1993.

Index Terms:
Asynchronous distributed systems, crash failure, distributed computing, early stopping, fault tolerance, global data, perfect failure detector, reliability.
Citation:
Carole Delporte-Gallet, Hugues Fauconnier, Jean-Michel H?lary, Michel Raynal, "Early Stopping in Global Data Computation," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 9, pp. 909-921, Sept. 2003, doi:10.1109/TPDS.2003.1233713
Usage of this product signifies your acceptance of the Terms of Use.