Issue No. 10 - October (1999 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.808144
<p><b>Abstract</b>—The current fault-tolerant routing methods require extensive changes to practical routers such as the Cray T3D's dimension-order router to handle faults. In this paper, we propose methods to handle faults in multicomputers with dimension-order routers with simple changes to router structure and logic. Our techniques can be applied to current implementations in which the router is partitioned into multiple modules and no centralized crossbar is used. We consider arbitrarily located faulty blocks and assume only local knowledge of faults. We apply our techniques for torus networks and show that, with as few as four virtual channels per physical channel, deadlock- and livelock-free routing can be provided even with multiple faults and multimodule implementation of routers. Our simulations of the proposed technique for 2D tori and mesh indicate that the performance degradation is similar to that seen in the case of cross-bar based designs previously proposed.</p>
Cray T3D router, dimension-order router, fault-tolerant routing, multicomputer networks, message routing, torus networks, wormhole routing.
R. V. Boppana and S. Chalasani, "Fault-Tolerant Communication with Partitioned Dimension-Order Routers," in IEEE Transactions on Parallel & Distributed Systems, vol. 10, no. , pp. 1026-1039, 1999.