Issue No. 11 - November (1994 vol. 5)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.329665
<p>It is important for a distributed computing system to be able to route messages aroundwhatever faulty links or nodes may be present. We present a fault-tolerant routingalgorithm that assures the delivery of every message as long as there is a path betweenits source and destination. The algorithm works on many common mesh architecturessuch as the torus and hexagonal mesh. The proposed scheme can also detect thenonexistence of a path between a pair of nodes in a finite amount of time. Moreover, thescheme requires each node in the system to know only the state (faulty or not) of eachof its own links. The performance of the routing scheme is simulated for both square andhexagonal meshes while varying the physical distribution of faulty components. It isshown that a shortest path between the source and destination of each message istaken with a high probability, and, if a path exists, it is usually found very quickly.</p>
Index Termsmessage passing; network routing; fault tolerant computing; software reliability; parallelalgorithms; parallel architectures; fault-tolerant routing; mesh architectures; distributedcomputing system; message routing; fault-tolerant routing algorithm; source; destination;hexagonal mesh; torus; routing scheme performance; hexagonal meshes; square meshes;high probability
A. Olson and K. shin, "Fault-Tolerant Routing in Mesh Architectures," in IEEE Transactions on Parallel & Distributed Systems, vol. 5, no. , pp. 1225-1232, 1994.