This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Election in Asynchronous Complete Networks with Intermittent Link Failures
July 1994 (vol. 43 no. 7)
pp. 778-788

Considers the problem of fault-tolerant leader election in asynchronous complete (fully-connected) distributed networks. The processors are reliable, but some of the communication channels may fail intermittently before or during the execution of the algorithm. Channel failures are undetectable due to the asynchronous nature of the network. Let n be the number of processors in the network and f be the maximum number of faulty channels incident on each processor, where f/spl les/ 1/2 [n-1]. Our algorithm uses at most O(n/sup 2/+nf/sup 2/) messages to elect a unique leader of the network. Each message consists of at most O(log|T|) bits, where |T| is the cardinality of the set of processor identifiers. All previous algorithms either tolerated only benign failures such as fail-stop failures, assumed that the network is synchronous, tolerated only a small number of failures, or assumed that the faults are detectable. Our algorithm is the first election algorithm that is designed specifically for asynchronous intermittently faulty complete networks in which up to 1/4 n[n-1] channels may be faulty, where each processor is adjacent to no more than 1/2 [n-1] faulty channels, and where the faults are undetectable.

[1] H. H. Abu-Amara, "Fault-tolerant distributed algorithm for election in complete networks,"IEEE Trans. Comput., vol. 37, no. 4, pp. 449-453, Apr. 1988.
[2] Y. Afek and E. Gafni, "Applying static network protocols to dynamic networks," inProc. 28th IEEE Symp. Foundations of Comput. Sci., Oct. 1987, pp. 358-370.
[3] Y. Afek and E. Gafni, "End-to-end communication in unreliable networks," inProc. 7th ACM Symp. Principles of Distrib. Computing, Toronto, ON, Canada, Aug. 1988, pp. 131-148.
[4] Y. Afek and E. Gafni, "Time and message bounds for election in synchronous and asynchronous complete networks," inProc. 4th ACM Symp. Principles Distributed Comput., Minacki, Ont., Canada, Aug. 1985, pp. 186-195.
[5] B. Awerbuch and M. Sipser, "Dynamic networks are as fast as static networks," inProc. 29th Annu. Symp. Foundations of Comput. Sci., White Plains, NY, Oct. 1988, pp. 206-220.
[6] B. Awerbuch, Y. Mansour, and N. Shavit, "Polynomial end-to-end communication," inProc. 30th Annu. Symp. Foundations of Comput. Sci., Research Triangle Park, NC, Oct. 1989, pp. 358-363.
[7] P. A. Alsberg and J. D. Day, "A principle for resilient sharing of distributed resources," inProc. 2nd Int. Conf. Software Eng., San Franscisco, CA, Oct. 1976, pp. 562-570.
[8] R. Bar-Yehuda and S. Kutten, "Fault-tolerant distributed majority commitment,"J. of Algorithms, vol. 9, pp. 568-582, 1988.
[9] R. Bar-Yehuda, S. Kutten, Y. Wolfstahl, and S. Zaks, "Making distributed spanning tree algorithms fault-resilient," inProc. 4th Symp. Theoretical Aspects of Comput. Sci., Passau, Germany, Feb. 1987, pp. 432-444.
[10] I. A. Cimet and P. R. S. Kumar, "A resilient distributed protocol for network synchronization," inACM SIGCOMM Symp. Commun. Arch. Protocols, Stowe, VT, Aug. 1986, pp. 358-367.
[11] M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of distributed consensus with one faulty process,"J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[12] E. Gafni, "Improvements in the complexity of two message-optimal election algorithms," inProc. 4th ACM Symp. Principles Distributed Comput., Minacki, Ont., Canada, Aug. 1985, pp. 175-185.
[13] O. Goldreich and L. Shrira, "The effect of link failures on computations in asynchronous rings," inProc. 5th ACM Symp. Principles Distributed Comput., Calgary, Alta., Canada, Aug. 1986, pp. 174- 185.
[14] E. Korach, S. Moran, and S. Zaks, "Tight lower and upper bounds for some distributed algorithms for a complete network of processors," inProc. 3rd ACM Symp. Principles Distributed Comput., Vancouver, B.C., Canada, Aug. 1984, pp. 199-207.
[15] J. Lokre, "Election in complete asynchronous networks with intermittent link failures," M.S. thesis, Dep. of Elec. Eng., Texas A&M Univ., College Station, TX, Aug. 1991.
[16] T. Masuzawa, N. Nishikawa, K. Hagihara, and N. Tokura, "Optimal fault-tolerant distributed algorithms for election in complete networks with a global sense of direction," inProc. 3rd Int. Workshop on Distrib. Algorithms, Nice, France, Sept. 1989. Also,Lecture Notes in Computer Science, no. 392, pp. 171-182, (Distributed Algorithms) 1989.
[17] F. Mattern, "Message complexity of simple ring-based election algorithms--An empirical analysis," inProc. IEEE 9th Int. Conf. Distrib. Computing Syst., 1989, pp. 94-100.
[18] D. A. Menasce, G. J. Popek, and R. R. Muntz, "A locking protocol for resource coordination in distributed databases,"ACM Trans. Database Syst., vol. 5, pp. 103-138, June 1980.
[19] C. Mohan, R. Strong, and S. Finkelstein, "Methods for distributed transaction commit and recovery using byzantine agreement within clusters of processes," inProc. 2nd ACM Symp. Principles Distrib. Computing, 1983, pp. 29-43.
[20] G. Neiger and S. Toueg, "Automatically increasing the fault-tolerance of distributed systems," inProc. Seventh Annu. ACM Symp. Principles Distributed Comput., Aug. 1988, pp. 248-262.
[21] G. L. Peterson, "Efficient algorithms for election in meshes and complete networks," Tech. Rep. TR-140, Dep. of Comput. Sci., Univ. of Rochester, Rochester, NY, July 1985.
[22] A. Tanenbaum,Computer Networks. Englewood Cliffs, NJ: Prentice-Hall, 1988.

Index Terms:
telecommunication channels; communication complexity; fault tolerant computing; reliability; failure analysis; telecommunication network management; fault-tolerant leader election; asynchronous complete networks; intermittent link failures; fully-connected distributed networks; reliable processors; communication channel failures; algorithm execution; faulty channels; processor identifier set cardinality; undetectable faults; distributed algorithms; message complexity.
Citation:
H. Abu-Amara, J. Lokre, "Election in Asynchronous Complete Networks with Intermittent Link Failures," IEEE Transactions on Computers, vol. 43, no. 7, pp. 778-788, July 1994, doi:10.1109/12.293257
Usage of this product signifies your acceptance of the Terms of Use.