This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Group Membership Algorithm with a Practical Specification
November 2001 (vol. 12 no. 11)
pp. 1190-1200

Abstract—This paper presents a solvable specification and gives an algorithm for the Group Membership Problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the Group Membership Problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models.

[1] Y. Amir, L.E. Moser, M. Melliar-Smith, D.A. Agarwal, and P. Ciarfella, “The Totem Single-Ring Ordering and Membership Protocol,” ACM Trans. Computer Systems, vol. 13, no. 4, pp. 311–342, 1995.
[2] Y. Amir, D. Dolev, S. Kramer, and D. Malki, “Membership Algorithms for Multicast Communication Groups,” Proc. Sixth Int'l Workshop of Distributed Algorithms, pp. 292-312, Nov. 1992.
[3] E. Anceaume, B. Charron-Bost, P. Minet, and S. Toueg, “On the Formal Specification of Group Membership Services,” Technical Report 95-1534, Computer Science Dept., Cornell Univ., Aug. 1995.
[4] T. Anker, G.V. Chockler, D. Dolev, and I. Keidar, “Scalable Group Membership Services for Novel Applications,” Proc. Workshop Networks in Distributed Computing (DIMACS 45), pp. 23-42, 1998.
[5] Ö. Babaoglu, R. Davoli, L.A. Giachini, and M.G. Baker, "Relacs: A Communications Infrastructure for Constructing Reliable Applications in Large-Scale Distributed Systems," Proc. 28th Hawaii Int'l Conf. System Sciences, pp. 612-621, Jan. 1995.
[6] Ö. Babaoglu, R. Davoli, and A. Montresor, "Group Membership and View Synchrony in Partitionable Asynchronous Systems: Specifications," Technical Report UBLCS-95-18, Dept. of Computer Science, Univ. of Bologna, Sept. 1996.
[7] O. Babaöglu, R. Davoli, and A. Montresor, “Group Communication in Partitionable Systems: Specifications and Algorithms,” Technical Report UBLCS-98-01, Computer Science Dept., Univ. of Bologna, Oct. 1999.
[8] M. Ben-Or, “Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols,” Proc. Second ACM Symp. Principles of Distributed Computing, pp. 27–30, Aug. 1983.
[9] K.P. Birman, Building Secure and Reliable Network Applications. Greenwich, Conn.: Manning Publications, 1996.
[10] K. Birman and T. Joseph, "Reliable Communications in Presence of Failures," ACM Trans. Computing Systems, vol. 5, no. 1, pp. 47-76, 1987.
[11] K.P. Birman and R. Van Renesse, Reliable Distributed Computing with the Isis Toolkit. IEEE CS Press, 1994.
[12] V. Bohossian, C. Fan, P. LeMahieu, M. Riedel, L. Xu, and J. Bruck, “Computing in the RAIN: A Reliable Array of Independent Nodes,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 2, pp. 97-113, Feb. 2001.
[13] G. Bracha and S. Toueg, “Resilient Consensus Protocols,” Proc. Second ACM Symp. Principles of Distributed Computing, pp. 12–26, 1983.
[14] R. Carr, “The Tandem Global Update Protocol” Tandem Systems Rev., June 1985.
[15] T.D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” J. ACM, vol. 43, no. 2, pp. 225–267, 1996.
[16] T. Chandra, V. Hadzilacos, and S. Toueg, “The Weakest Failure Detector for Solving Consensus,” Proc. 11th ACM Symp. Principles of Distributed Computing, pp. 147-158, Aug. 1992.
[17] T.D. Chandra, V. Hadzillacos, S. Toueg, and B. Charron-Bost, “On the Impossibility of Group Membership,” Proc. 15th ACM Symp. Principles of Distributed Computing, pp. 322–330, 1996.
[18] F. Cristian, “Reaching Agreement on Processor Group Membership in Synchronous Distributed Systems,” Distributed Computing, vol. 4, pp. 175-187, Apr. 1991.
[19] F. Cristian, “Probabilistic Clock Synchronization,” Distributed Computing, vol. 3, pp. 146-158, 1989.
[20] F. Cristian, “Synchronous and Asynchronous Group Communication,” Comm. ACM, vol. 39, no. 4, pp. 88-97, 1996.
[21] F. Cristian and C. Fetzer, “The Timed Asynchronous Distributed System Model,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 6, pp. 642-657, June 1999.
[22] F. Cristian and F. Schmuck, “Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems,” Technical Report CSE95-428, Computer Science Dept., Univ. of California at San Diego, 1995.
[23] D. Dolev and D. Malki, “The Transis Approach to High Availability Cluster Communication,” Comm. ACM, vol. 39, no. 4, pp. 64–70, 1996.
[24] D. Dolev, D. Malki, and R. Strong, “An Asynchronous Membership Protocol that Tolerates Partitions,” Technical Report CS94-6, Computer Science Dept., The Hebrew Univ. of Jerusalem, 1994.
[25] D. Dolev, D. Malki, and R. Strong, “A Framework for Partitionable Membership Services,” Technical Report CS95-4, Computer Science Dept., The Hebrew Univ. of Jerusalem, 1995.
[26] C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the Presence of Partial Synchrony,” J. ACM. vol. 35, no. 2, pp. 288–323, Apr. 1988.
[27] A. Fekete, N. Lynch, and A. Shvartsman, Specifying and Using a Partitionable Group Communication Service Proc. ACM Symp. Principles of Distributed Computing, pp. 53-62, 1997.
[28] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374i–382, 1985.
[29] M. Franceschetti and J. Bruck, “On the Possibility of Group Membership,” Proc. IEEE Workshop Fault Tolerant Parallel and Distributed Systems, Apr. 1999.
[30] S. Jajodia and D. Mutchler, “Dynamic Voting Algorithms for Maintaining the Consistency of a Database,” ACM Trans. Data Systems, vol. 15, no. 2, pp. 230-280, June 1990.
[31] F. Jahanian, A. Fakhouri, and R. Rajkumar, "Processor group membership protocols: Specification, design and implementation," Proc. 12th Symp. Reliable Distributed Systems, pp. 2-11, Oct. 1993.
[32] C.A.R. Hoare,“Communicating sequential processes,” Comm. of the ACM, vol. 21, no. 8, pp. 666-677, Aug. 1978.
[33] M.F. Kaashoek and A.S. Tanenbaum, “Group Communication in the Amoeba Distributed Operating System,” Proc. IEEE 11th Int'l Conf. Distributed Computing Systems (ICDCS), pp. 222-230, 1991.
[34] I. Keidar, J. Sussman, K. Marzullo, and D. Dolev, “A Client Server Oriented Algorithm for Virtually Synchronous Group Membership in WANs,” Proc. 20th Int'l Conf. Distributed Computing Systems, Apr. 2000.
[35] F. Kroger, "Temporal Logic of Programs," EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa, eds. Springer-Verlag, 1987.
[36] P. LeMahieu and J. Bruck, “A Consistent History Link Connectivity Protocol,” Proc. 17th ACM Symp. Principles of Distributed Computing, p. 309, July 1998.
[37] P.S. LeMahieu, V.Z. Bohossian, and J. Bruck, “Fault-Tolerant Switched Local Area Networks,” Proc. Int'l Parallel Processing Symp., pp. 747–751, 1998.
[38] J.H. Lala and R.E. Harper, "Architectural Principles for Safety-Critical Real-Time Applications," Proc. IEEE, vol. 82, no. 1, pp. 25-40, Jan. 1994.
[39] N. Lynch, Distributed Algorithms. New Jersey, Morgan Kaufman, 1996.
[40] C.P. Malloth, P. Felher, A. Shiper, and U. Wilhelm, “Phoenix: A Toolkit for Building Fault-Tolerant Distributed Applications in Large Scale Networks,” Proc. Workshop Parallel and Distributed Platforms in Industrial Products, Oct. 1995.
[41] C.P. Malloth and A. Shiper, “View Synchronous Communication in Large Scale Distributed Systems,” Proc. Second Open Workshop ESPRIT Project, July 1995.
[42] A.J. Martin, “The Probe: An Addition to Communication Primitives,” Information Processing Letters 20, pp. 125-130, 1985.
[43] A.J. Martin, “Compiling Communicating Processes into Delay-Insensitive VLSI Circuits,” Distributed Computing, vol. 1, no. 4, pp. 226-234, 1986.
[44] A.J. Martin, "Programming in VLSI, From Communicating Processes to Delay-Insensitive Circuits," Developments in Concurrency and Communication, chapter one, pp. 1-64. Addison-Wesley, 1990.
[45] P.M. Melliar-Smith, L.E. Moser, and V. Agrawala, “Processor Membership in Asynchronous Distributed Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 5, pp. 459-473, May 1994.
[46] L.E. Moser, Y. Amir, P.M. Melliar-Smith, and D.A. Agarwal, "Extended Virtual Synchrony," Proc. 14th Int'l Conf. Distributed Computing Systems, pp. 56-65, June 1994.
[47] L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal, R.K. Budhia, and C.A. Lingley-Papadopoulos, “Totem: A Fault-Tolerant Multicast Group Communication System,” Comm. ACM, vol. 39, no. 4, pp. 54–63, 1996.
[48] G. Neiger, “A New Look at Membership Services,” Proc. 15th ACM Symp. Principles of Distributed Computing, pp. 331–340, 1996.
[49] A.M. Ricciardi, “The Group Membership Problem in Asynchronous Systems,” PhD thesis, Dept. of Computer Science, Cornell Univ., 1993.
[50] A. Ricciardi and K. Birman, “Using Process Groups to Implement Failure Detection in Asynchronous Environments,” Proc. ACM Symp. Principles of Distributed Computing, ACM Press, New York, 1991, pp. 341‐351.
[51] A.M. Ricciardi and K. Birman, “Process Membership in Asynchronous Environments,” technical report, Dept. of Computer Science, Cornell Univ., 1995.
[52] A.S. Tanenbaum et al., "Experiences with the Amoeba Distributed Operating System," Comm. ACM, Vol. 33, No. 12, Dec. 1990, pp. 46-63.
[53] R. van Renesse, K.P. Birman, and S. Maffeis, “Horus: A Flexible Group Communication System,” Comm. ACM, vol. 39, no. 4, pp. 76–83, 1996.
[54] R. Vitenberg, I. Keidar, G. Chockler, and D. Dolev, “Group Communication Specifications: A Comprehensive Study,” Technical Report CS99-31, Institute of Computer Science, The Hebrew Univ. of Jerusalem, Sept. 1999.

Index Terms:
Distributed agreement algorithms, group membership, asynchronous systems.
Citation:
Massimo Franceschetti, Jehoshua Bruck, "A Group Membership Algorithm with a Practical Specification," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 11, pp. 1190-1200, Nov. 2001, doi:10.1109/71.969128
Usage of this product signifies your acceptance of the Terms of Use.