loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Second IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Enhancing Replica Management Services to Tolerate Group Failures
Saint-Malo, France
May 02-May 05
ISBN: 0-7695-0207-5
Paul D Ezhilchelvan, University of Newcastle upon Tyne
Santosh K Shrivastava, University of Newcastle upon Tyne
In a distributed system, replication of components, such as objects, is a well known way of achieving availability. For increased availability, crashed and disconnected components must be replaced by new components on available spare nodes.In this context, we address the problem of reconfiguring a group after the group as an entity has failed. Such a failure is termed a group failure which, for example, can be the crash of every component in the group or the group being partitioned into minority islands.The solution assumes crash-proof storage, and eventual recovery of crashed nodes and healing of partitions. It guarantees that (i) the number of groups reconfigured after a group failure is never more than one, and (ii) the reconfigured group contains a majority of the components which were members just before the group failed, so that the loss of state information due to group failure is minimal.The protocol is efficient in terms of communication rounds and use of stable store, during both normal operations and reconfiguration after a group failure.
Index Terms:
system availability, object groups, group failures, node crashes, network partitions, membership views, membership services.
Citation:
Paul D Ezhilchelvan, Santosh K Shrivastava, "Enhancing Replica Management Services to Tolerate Group Failures," isorc, pp.263, Second IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, 1999
Usage of this product signifies your acceptance of the Terms of Use.