Dynamic Reconfiguration in Computer Clusters with Irregular Topologies in the Presence of Multiple Node and Link Failures
Issue No.05 - May (2005 vol.54)
Dimiter Avresky , IEEE
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2005.76
Component failures in high-speed computer networks can result in significant topological changes. In such cases, a network reconfiguration algorithm must be executed to restore the connectivity between the network nodes. Most contemporary networks use either static reconfiguration algorithms or stop the user traffic in order to prevent cyclic dependencies in the routing tables. The goal of this paper is to present NetRec, a dynamic network reconfiguration algorithm for tolerating multiple node and link failures in high-speed networks with arbitrary topology. The algorithm updates the routing tables asynchronously and does not require any global knowledge about the network topology. Certain phases of NetRec are executed in parallel, which reduces the reconfiguration time. The algorithm suspends the application traffic in small regions of the network only while the routing tables are being updated. The message complexity of NetRec is analyzed and the termination, liveness, and safety of the proposed algorithm are proven. Additionally, results from validation of the algorithm in a distributed network-validation testbed Distant, based on the MPI 1.2 features for building arbitrary virtual topologies, are presented.
Dynamic reconfiguration, multiple node and link failures, fault tolerance, clusters of workstations, irregular topologies.
Dimiter Avresky, Natcho Natchev, "Dynamic Reconfiguration in Computer Clusters with Irregular Topologies in the Presence of Multiple Node and Link Failures", IEEE Transactions on Computers, vol.54, no. 5, pp. 603-615, May 2005, doi:10.1109/TC.2005.76