Fourth International Symposium on Object-Oriented Real-Time Distributed Computing
Designing a Service of Failure Detection in Asynchronous Distributed Systems
Magdeburg, Germany
May 02-May 04
ISBN: 0-7695-1089-2
Abstract: Even though introduced for solving the consensus problem in asynchronous distributed systems, the notion of unreliable failure detector can be used as a powerful tool for any distributed protocol in order to get better performance by allowing the usage of aggressive time-outs to detect failures of entities executing the protocol. In this paper we present the designing of a Failure Detection Service (FDS) based on the notion of unreliable failure detectors introduced by Chandra and Toueg. FDS is able to detect crashed objects and entities that permanently omit to send messages without imposing changes to the source code of the underlying protocols that use this service. Also, FDS provides an object-oriented interface to its subscribers and, more important, it does not add network overhead if no entity subscribes to the service. This paper can be also seen as a first step towards a distributed implementation of a heartbeat-based failure management system as defined in Fault-Tolerant CORBA specification.
Citation:
Roberto Baldoni, Fabio Zito, "Designing a Service of Failure Detection in Asynchronous Distributed Systems," isorc, pp.0113, Fourth International Symposium on Object-Oriented Real-Time Distributed Computing, 2001