This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Group Communication in Partitionable Systems: Specification and Algorithms
April 2001 (vol. 27 no. 4)
pp. 308-336

Abstract—We give a formal specification and an implementation for a partitionable group communication service in asynchronous distributed systems. Our specification is motivated by the requirements for building “partition-aware” applications that can continue operating without blocking in multiple concurrent partitions and reconfigure themselves dynamically when partitions merge. The specified service guarantees liveness and excludes trivial solutions, it constitutes a useful basis for building realistic partition-aware applications, and it is implementable in practical asynchronous distributed systems where certain stability conditions hold.

[1] Y. Amir, L.E. Moser, M. Melliar-Smith, D.A. Agarwal, and P. Ciarfella, “The Totem Single-Ring Ordering and Membership Protocol,” ACM Trans. Computer Systems, vol. 13, no. 4, pp. 311–342, 1995.
[2] Y. Amir and J. Stanton, “The Spread Wide-Aread Group Communication System,” technical report, Center of Networking and Distributed Systems, Johns Hopkins Univ., Baltimore, Md, Apr. 1998.
[3] E. Anceaume et al. “On the Formal Specification of Group Membership Services,” Technical Report TR95-1534, Computer Science Dept., Cornell Univ., Ithaca, N.Y., Aug. 1995.
[4] T. Anker et al. “Scalable Group Membership Services for Novel Applications,” Proc. DIMACS Workshop Networks in Distributed Computing, pp. 23-42, 1998.
[5] Ö. Babaoglu, R. Davoli, L.A. Giachini, and M.G. Baker, "Relacs: A Communications Infrastructure for Constructing Reliable Applications in Large-Scale Distributed Systems," Proc. 28th Hawaii Int'l Conf. System Sciences, pp. 612-621, Jan. 1995.
[6] Ö. Babaoglu et al. “System Support for Partition-Aware Network Applications,” Proc. Int'l Conf. on Distributed Computing Systems, pp. 184-191, Amsterdam, May 1998.
[7] K. Birman, "The Process Group Approach to Reliable Distributed Computing," Comm. ACM, vol. 36, no. 12, pp. 37-53, 1993.
[8] K.P. Birman and R. Van Renesse, Reliable Distributed Computing with the Isis Toolkit. IEEE CS Press, 1994.
[9] T.D. Chandra, V. Hadzillacos, S. Toueg, and B. Charron-Bost, “On the Impossibility of Group Membership,” Proc. 15th ACM Symp. Principles of Distributed Computing, pp. 322–330, 1996.
[10] T.D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” J. ACM, vol. 43, no. 2, pp. 225–267, 1996.
[11] D. Dolev, R. Friedman, I. Keidar, and D. Malkhi, Failure Detectors in Omission Failure Environments Proc. 16th Ann. ACM Symp. Principles of Distributed Computing, p. 286, 1997.
[12] D. Dolev and D. Malki, “The Transis Approach to High Availability Cluster Communication,” Comm. ACM, vol. 39, no. 4, pp. 64–70, 1996.
[13] D. Dolev, D. Malki, and R. Strong, “An Asynchronous Membership Protocol that Tolerates Partitions,” Technical Report CS94-6, Inst. of Computer Science, Hebrew Univ. of Jerusalem, Mar. 1994.
[14] D. Dolev, D. Malki, and R. Strong, “A Framework for Partitionable Membership Service,” Technical Report CS95-4, Inst. of Computer Science, Hebrew Univ. of Jerusalem, 1995.
[15] D. Dolev, D. Malki, and R. Strong, “A Framework for Partitionable Membership Service,” Proc. ACM Symp. Principles of Distributed Computing, May 1996.
[16] P. Ezhilchelvan, R. Macedo, and S. Shrivastava, "Newtop: A Fault-Tolerant Group Communication Protocol," Proc. 15th Int'l Conf. Distributed Computing Systems, IEEE CS Press, Vancouver, BC, Canada, June 1995.
[17] A. Fekete, N. Lynch, and A. Shvartsman, Specifying and Using a Partitionable Group Communication Service Proc. ACM Symp. Principles of Distributed Computing, pp. 53-62, 1997.
[18] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374i–382, 1985.
[19] R. Friedman and R. van Renesse, “Strong and Weak Virtual Synchrony in Horus,” Technical Report TR95-1537, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y., Mar. 1995.
[20] M.G. Hayden, “The Ensemble System,” PhD thesis, Cornell Univ., 1998.
[21] M.F. Kaashoek and A.S. Tanenbaum, “Group Communication in the Amoeba Distributed Operating System,” Proc. IEEE 11th Int'l Conf. Distributed Computing Systems (ICDCS), pp. 222-230, 1991.
[22] R. Khazan, A. Fekete, and N. Lynch, “Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service,” Proc. Int'l Symp. Distributed Computing, Sept. 1998.
[23] C. Malloth, “Conception and Implementation of a Toolkit for Building Fault-Tolerant Distributed Applications in Large-Scale Networks,” doctoral dissertation, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 1996.
[24] C. Malloth and A. Schiper, “View Synchronous Communication in Large Scale Networks,” Proc. Open Workshop ESPRIT Project Broadcast, July 1995.
[25] A. Montresor, “A Dependable Registry Service for the Jgroup Distributed Object Model,” Proc. European Reasearch Seminar Advances in Distributed Systems (ERSADS '99), Apr. 1999.
[26] A. Montresor, “The Jgroup Reliable Distributed Object Model,” Proc. IFIP Int'l Working Conf. Distributed Applications and Systems, June 1999.
[27] L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal, R.K. Budhia, and C.A. Lingley-Papadopoulos, “Totem: A Fault-Tolerant Multicast Group Communication System,” Comm. ACM, vol. 39, no. 4, pp. 54–63, 1996.
[28] L.E. Moser, Y. Amir, P.M. Melliar-Smith, and D.A. Agarwal, "Extended Virtual Synchrony," Proc. 14th Int'l Conf. Distributed Computing Systems, pp. 56-65, June 1994.
[29] G. Neiger, “A New Look at Membership Services,” Proc. 15th ACM Symp. Principles of Distributed Computing, pp. 331–340, 1996.
[30] R. De Prisco et al. “A Dynamic View-Oriented Group Communication Service,” Proc. ACM Symp. Principles of Distributed Computing, June 1998.
[31] A. Ricciardi and K. Birman, “Using Process Groups to Implement Failure Detection in Asynchronous Environments,” Proc. ACM Symp. Principles of Distributed Computing, ACM Press, New York, 1991, pp. 341‐351.
[32] A. Schiper and A. Ricciardi, "Virtually-Synchronous Communication Based on a Weak Failure Suspector," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 534-543, June 1993.
[33] Reliable Distributed Computing with the Isis Toolkit, K.P. Birman and R. van Renesse, eds. IEEE CS Press, 1993.
[34] R. van Renesse, K.P. Birman, and S. Maffeis, “Horus: A Flexible Group Communication System,” Comm. ACM, vol. 39, no. 4, pp. 76–83, 1996.

Index Terms:
Group communication, view synchrony, partition-awareness, asynchronous systems, fault tolerance.
Citation:
Özalp Babaoglu, Renzo Davoli, Alberto Montresor, "Group Communication in Partitionable Systems: Specification and Algorithms," IEEE Transactions on Software Engineering, vol. 27, no. 4, pp. 308-336, April 2001, doi:10.1109/32.917522
Usage of this product signifies your acceptance of the Terms of Use.