The Community for Technology Leaders
Reliable Distributed Systems, IEEE Symposium on (2006)
Leeds, United Kingdom
Oct. 2, 2006 to Oct. 4, 2006
ISSN: 1060-9857
ISBN: 0-7695-2677-2
pp: 132-142
Barry Porter , Lancaster University, Lancaster, UK
Francois Taiani , Lancaster University, Lancaster, UK
Geoff Coulson , Lancaster University, Lancaster, UK
We present and evaluate a generic approach to the repair of overlay networks which identifies general principles of overlay repair and embodies these as a reusable service. At the heart of our approach is an algorithm that discovers the extent of a failed section of any type of overlay, and assigns responsibility to carry out the repair. The repair strategy itself is 'pluggable' and can be tailored to the requirements of a specific overlay type or instance. Our approach is efficient in terms of the number of repair-related message exchanges it incurs; scalable in that it involves only nodes in the locality of the failed section of the overlay; and resilient in that it correctly handles cases in which multiple adjacent nodes fail simultaneously, and it tolerates new failures that occur while a repair is underway. The benefits of our approach are that: (i) it extracts and encapsulates best practice in repair for overlays; (ii) it simplifies the design and implementation of new overlays (because repair issues can be treated orthogonally to basic functionality); and (iii) it supports tailorable levels of dependability for overlays, including pluggable repair strategies
computer networks, fault tolerant computing

B. Porter, F. Taiani and G. Coulson, "Generalised Repair for Overlay Networks," 2006 25th IEEE Symposium on Reliable Distributed Systems(SRDS), Leeds, 2008, pp. 132-142.
120 ms
(Ver 3.3 (11022016))