This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Timewheel Group Communication System
August 2002 (vol. 51 no. 8)
pp. 883-899

This paper describes a group communication system, called the timewheel group communication system, that has been designed for a timed asynchronous distributed system model. All protocols in the timewheel frop communication system have been designed to be fail-aware in the sense that a process can detect, at any point in time, whether any of its properties is violated. Although these protocols have been designed to operate in an asynchronous distributed computing environment, they provide timeliness properties. The timewheel group communication system provides nine group communication semantics that a user can dynamically choose from while broadcasting an update. This system provides high throughput, fast delivery and stability times, uses a small number of messages per update broadcast, and evenly distributes the processing load among group members.

[1] The Common Object Request Broker: Architecture and Specification. Object Management Group, 1995.
[2] Y. Amir, C. Danilov, and J. Stanton, “A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication,” Proc. Int'l Conf. Dependable Systems and Networks, June 2000.
[3] Y. Amir et al., Transis:“A Communication Subsystem for High Availability,” Proc. Int’l Symp. Fault‐Tolerant Computing, IEEE CS Press, Los Alamitos, Calif., 1992, pp. 76‐84.
[4] Y. Amir, L.E. Moser, M. Melliar-Smith, D.A. Agarwal, and P. Ciarfella, “The Totem Single-Ring Ordering and Membership Protocol,” ACM Trans. Computer Systems, vol. 13, no. 4, pp. 311–342, 1995.
[5] Y. Amir and J. Stanton, “The Spread Wide Area Group Communication System,” Technical Report CNDS-98-4, 1998, http://www.rstcorp.com/~anup/http://www.csie.nctu.edu.tw/ ~yctsenghttp://www.cnds.jhu.edu publications/.
[6] R.A. Benel, R.D. Dancey, J.D. Dehn, J. Gutmann, and D. Smith, “Advanced Automation System Design,” Proc. IEEE, vol. 77, no. 11, pp. 1653-1660, Nov. 1989.
[7] K.P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky, “Bimodal Multicast,” ACM Trans. Computer Systems, vol. 17, no. 2, pp. 41-88, May 1999.
[8] K. Birman and T. Joseph, "Reliable Communications in Presence of Failures," ACM Trans. Computing Systems, vol. 5, no. 1, pp. 47-76, 1987.
[9] K. Birman, A. Schiper, and P. Stephenson, “Lightweight Causal and Atomic Group Multicast,” ACM Trans. Computer Systems, vol. 9, no. 3, pp. 272-314, Aug. 1991.
[10] T.D. Chandra, V. Hadzillacos, S. Toueg, and B. Charron-Bost, “On the Impossibility of Group Membership,” Proc. 15th ACM Symp. Principles of Distributed Computing, pp. 322–330, 1996.
[11] T.D. Chandra and S. Toueg, "Unreliable Failure Detectors for Asynchronous Systems," Proc. 10th ACM Symp. Principles of Distributed Computing, pp. 325-340, Aug. 1991.
[12] J. Chang and N. Maxemchuk,“Reliable broadcast protocols,”ACM Trans. Comput. Syst., vol. 2, pp. 251–273, Aug. 1984.
[13] D.R. Cheriton and D. Skeen, "Understanding the Limitations of Causally and Totally Ordered Communications," Operating Systems Rev., Dec. 1993, pp. 44-57.
[14] F. Cristian, "Understanding Fault-Tolerant Distributed Systems," Comm. ACM, vol. 34, no. 2, Feb. 1991.
[15] F. Cristian, "Group, Majority, and Strict Agreement in Timed Asynchronous Distributed Systems," Proc. 26th Int'l Symp. Fault-Tolerant Computing, June 1996.
[16] F. Cristian, B. Dancey, and J. Dehn, “Fault-Tolerance in Air Traffic Control Systems,” ACM Trans. Computer Systems, vol. 14, no. 3, pp. 265–286, Aug. 1996.
[17] F. Cristian and C. Fetzer, “The Timed Asynchronous Distributed System Model,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 6, pp. 642-657, June 1999.
[18] F. Cristian, S. Mishra, and G. Alvarez, “High-Performance Asynchronous Atomic Broadcast,” Distributed Systems Eng., vol. 4, no. 2, pp. 109-128, June 1997.
[19] F. Cristian, S. Mishra, and Y. Hyun, “Implementation and Performance of a Stable Storage Service for Unix,” Proc. 15th Symp. Reliable Distributed Systems, pp. 86–95, Niagara-on-the-Lake, Canada, Oct. 1996.
[20] P. Ezhilchelvan, R. Macedo, and S. Shrivastava, "Newtop: A Fault-Tolerant Group Communication Protocol," Proc. 15th Int'l Conf. Distributed Computing Systems, IEEE CS Press, Vancouver, BC, Canada, June 1995.
[21] C. Fetzer and F. Cristian, “On the Possibility of Consensus in Asynchronous Systems,” Proc. 1995 Pacific Rim Int'l Symp. Fault-Tolerant Systems, 1995.
[22] C. Fetzer and F. Cristian, “Fail-Awareness in Timed Asynchronous Systems,” Proc. 15th ACM Symp. Principles of Distributed Computing, pp. 314–321a, Philadelphia, May 1996. Also available as
[23] C. Fetzer and F. Cristian, “Fail-Awareness: An Approach to Construct Fail-Safe Applications,” Proc. 27th Ann. Int'l Symp. Fault-Tolerant Computing, Seattle, June 1997. Also available as
[24] C. Fetzer and F. Cristian, “Derivation of Fail-Aware Membership Service Specifications,” Lecture Notes in Computer Science, vol. 1388, pp. 664-680, Springer Verlag, 1998.
[25] C. Fetzer and F. Cristian, “Building Fault-Tolerant Hardware Clocks,” Proc. Seventh IFIP Int'l Working Conf. Dependable Computing for Critical Applications, San Jose, Calif., Jan 1999. Also available as
[26] C. Fetzer and F. Cristian, “A Fail-Aware Datagram Service,” IEE Proceedings—Software Eng., pp. 58-74, Apr. 1999.
[27] C. Fetzer and F. Cristian, “Fail-Awareness: An Approach to Construct Fail-Safe Applications,” J. Real-Time Systems, to appear.
[28] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374i–382, 1985.
[29] C. Gray and D. Cheriton, "Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency," Proc. 12th Int'l Symp. Operating System Principles, 1989.
[30] J.-F. Hermant and G.L. Lann, “Asynchronous Uniform Consensus in Real-Time Distributed Systems,” IEEE Trans. Computers, vol. 51, no. 8, pp. , Aug. 2002.
[31] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabi, C. Senft, and R. Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The MARS Approach," IEEE Micro, pp. 25-58, Feb. 1989.
[32] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[33] S. Mishra and G. Pang, “Teams: An Availability Management Serive for a Timed Asynchronous Distributed System,” ISCA Int'l J. Computers and Their Applications, to appear, 2002.
[34] S. Mishra, L. Peterson, and R. Schlichting, “Consul: A Communication Substrate for Fault-Tolerant Distributed Programs,” Distributed Systems Eng., vol. 1, no. 2, pp. 87-103, Dec. 1993.
[35] S. Mishra and L. Wu, “An Evaluation of Flow Control in Group Communication,” IEEE/ACM Trans. Networking, vol. 6, no. 5, Oct. 1998.
[36] L. Peterson, N. Bucholz, and R. Schlichting, “Preserving and Using Context Information in Interprocess Communication,” ACM Trans. Computer Systems, Aug. 1989, pp. 217‐246.
[37] D. Powell, G. Bonn, D. Seaton, P. Verissimo, and F. Waeselynck, The Delta-4 Approach to Dependability in Open Distributed Computing Systems Proc. 18th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS-18), pp. 246-251, June 1988.
[38] L. Rodrigues, K. Guo, A. Sargento, R. van Renesse, B. Glade, P. Verissimo, and K. Birman, “A Transparent Light-Weight Group Service,” Proc. 15th IEEE Symp. Reliable Distributed Systems, pp. 130-139, Oct. 1996.
[39] L. Sabel and K. Marzullo, “Election vs. Consensus in Asynchronous Systems,” Technical Report TR95-1488, Cornell Univ., Feb. 1995.
[40] R. van Renesse, K.P. Birman, and S. Maffeis, “Horus: A Flexible Group Communication System,” Comm. ACM, vol. 39, no. 4, pp. 76–83, 1996.
[41] P. Verissimo and A. Casimiro, “The Timely Computing Base: Model and Architecture,” IEEE Trans. Computers, vol. 51, no. 8, pp. , Aug. 2002.
[42] P. Verissimo and J. Marques, “Reliable Broadcast for Fault-Tolerance on Local Computer Networks,” Proc. Ninth Symp. Reliable Distributed Systems, pp. 54-63, Oct. 1990.

Index Terms:
Group communication, timed asynchronous distributed system, high availability, fault tolerance, replication.
Citation:
Shivakant Mishra, Christof Fetzer, Flaviu Cristian, "The Timewheel Group Communication System," IEEE Transactions on Computers, vol. 51, no. 8, pp. 883-899, Aug. 2002, doi:10.1109/TC.2002.1024737
Usage of this product signifies your acceptance of the Terms of Use.