This Article 
 Bibliographic References 
 Add to: 
Processor Membership in Asynchronous Distributed Systems
May 1994 (vol. 5 no. 5)
pp. 459-473

Presents protocols for determining processor membership in asynchronous distributedsystems that are subject to processor and communication faults. These protocols dependon the placement of a total order on broadcast messages. The types of systems forwhich each of these protocols is applicable are characterized by the properties of thecommunication mechanisms and by the availability of stable storage. In the absence ofstable storage or of a mechanism for distinguishing promptly delivered messages, theauthors show that no membership protocol can exist. They also discuss their experiencein implementing these membership protocols.

[1] Y. Amir, D. Dolev, S. Kramer, and D. Malki, "Membership algorithms for multicast communication groups," inProc. 6th Int. Workshop Distrib. Algorithms(Lecture Notes in Computer Science 647), Haifa, Israel, Nov. 1992, pp. 292-312.
[2] K.P. Birman and T.A. Joseph, "Reliable Communication in the Presence of Failures,"ACM Trans. Computer Systems, Vol. 5, No. 1, Feb. 1987, pp. 47-76.
[3] S. A. Bruso, "A failure detection and notification protocol for distributed computing systems," inProc. IEEE 5th Int. Conf. Distrib. Comput. Syst., Denver, CO, May 1985, pp. 116-123.
[4] T. D. Chandra and S. Toueg, "Unreliable failure detectors for asynchronous systems," inProc. Tenth Annu. ACM Symp. Principles Distrib. Comput., Montreal, PQ, Canada, Aug. 1991, pp. 325-340.
[5] J. Chang and N. F. Maxemchuk, "Reliable broadcast protocols,"ACM Trans. Comput. Syst., vol. 2, no. 3, pp. 251-273, Aug. 1984.
[6] F. Cristian, H. Aghili, R. Strong, and D. Dolev, "Atomic broadcast: From simple diffusion to Byzantine agreement," inProc. 15th Int. Conf. Fault-Tolerant Comput., Ann Arbor, MI, June 1985, pp. 200-206.
[7] F. Cristian, "Reaching agreement on processor group membership in synchronous distributed systems,"Distrib. Comput., vol. 4, no. 4, pp. 175-187, Apr. 1987.
[8] A. El Abbadi, D. Skeen, and F. Cristian, "An efficient, fault-tolerant protocol for replicated data management, inProc. 4th ACM SIGACT-SIGMOD Symp. Principles Database Syst., Portland, OR, Mar. 1985, pp. 215-228.
[9] P. D. Ezhilchelvan and R. de Lemos, "A robust group membership algorithm for distributed real-time systems," inProc. Real-Time Syst. Symp., Lake Buena Vista, FL, Dec. 1990, pp. 173-179.
[10] M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of distributed consensus with one faulty process,"J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[11] M. F. Kaashoek and A. S. Tanenbaum, "Group communication in the Amoeba distributed operating system," inProc. 11th Int. Conf. on Distrib. Comput. Syst.(Arlington, TX), May 1991, pp. 222-230.
[12] H. Kopetz, G. Grünsteidl, and J. Reisinger, "Fault-tolerant membership service in a synchronous distributed real-time system," inProc. Int. Working Conf. Dependable Comput. Critical Applicat., Santa Barbara, CA, Aug. 1989, pp. 167-174.
[13] S. W. Luan and V. D. Gligor, "A fault-tolerant protocol for atomic broadcast,"IEEE Trans. Parallel Distrib. Syst., vol. 1, no. 3, July 1990, pp. 271-285.
[14] P. M. Melliar-Smith, L. E. Moser, and V. Agrawala, "Broadcast protocols for distributed systems,"IEEE Trans. Parallel Distrib. Syst., vol. 1, no. 1, pp. 17-25, Jan. 1990.
[15] P. M. Melliar-Smith and L. E. Moser, "Trans: A reliable broadcast protocol,"IEE Trans. Commun., Speech and Vision, vol. 140, pp. 481-492, Dec. 1993.
[16] S. Mishra, L. L. Peterson, and R. D. Schlichting, "A membership protocol based on partial order," inProc. Second Int. Working Conf. Dependable Comput. Critical Applicat. 2, Tucson, AZ, Feb. 1991,Dependable Computing and Fault-Tolerant Systems, vol. 6. New York: Springer-Verlag, 1991, pp. 309-331.
[17] L. E. Moser, P. M. Melliar-Smith, and V. Agrawala, "Asynchronous fault-tolerant total ordering algorithms,"SIAM J. Comput., vol. 22, no. 4, pp. 727-750, Aug. 1993.
[18] L. E. Moser, P. M. Melliar-Smith, and V. Agrawala, "Necessary and sufficient conditions for broadcast consensus protocols,"J. Distrib. Comput., vol. 7, pp. 75-85, Dec. 1993.
[19] L. L. Peterson, N. Buchholz, and R. D. Schlichting, "Preserving and using context information in interprocess communication,"ACM Trans. Comput. Syst., vol. 7, no. 3, pp. 217-246, Aug. 1989.
[20] A. M. Ricciardi and K. P. Birman, "Using process groups to implement failure detection in asynchronous environments," inProc. Tenth Annu. ACM Symp. Principles Distrib. Comput., Montreal, PQ, Canada, Aug. 1991, pp. 341-353.
[21] B. Walter, "A robust and efficient protocol for checking the availability of remote sites,"Comput. Netw., vol. 6, no. 3, July 1982, pp. 173-188.

Index Terms:
Index Termsprotocols; distributed processing; fault tolerant computing; asynchronous distributedsystems; processor membership; broadcast messages; membership protocol; asynchrony;broadcast communication; distributed systems; fault tolerance; reconfiguration; totalorder
L.E. Moser, P.M. Melliar-Smith, V. Agrawala, "Processor Membership in Asynchronous Distributed Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 5, pp. 459-473, May 1994, doi:10.1109/71.282557
Usage of this product signifies your acceptance of the Terms of Use.