This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
FRoots: A Fault Tolerant and Topology-Flexible Routing Technique
October 2006 (vol. 17 no. 10)
pp. 1136-1150
Ingebj? Theiss, IEEE Computer Society
Olav Lysne, IEEE

Abstract—Existing solutions for fault-tolerant routing in interconnection networks either work for only one given regular topology, or require slow and costly network reconfigurations that do not allow full and continuous network access. In this paper, we present FRroots, a routing method for fault tolerance in topology-flexible network technologies. Our method is based on redundant paths, and can handle single dynamic faults without sending control messages other than those that are needed to inform the source nodes of the failing component. Used in a modus with local rerouting, the source nodes need not be informed and no control messages are necessary for the network to stay connected despite of a single fault. In fault-free networks under nonuniform traffic our routing method performs comparable to, or even better than, topology specific routing algorithms in regular networks like meshes and tori. FRoots does not require any other features in the switches or end nodes than a flexible routing table, and a modest number of virtual channels. For that reason, it can be directly applied to several present day technologies like InfiniBand and Advanced Switching.

[1] N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic, and W.-K. Su, “Myrinet— A Gigabit-per-Second Local-Area Network,” IEEE Micro, 1995.
[2] R.V. Boppana and S. Chalasani, “Fault-Tolerant Routing with Non-Adaptive Wormhole Algorithms in Mesh Networks,” Proc. Conf. Supercomputing, pp. 693-702, 1994.
[3] C. Carrion, R. Beivide, J.A. Gregorio, and F. Vallejo, “A Flow Control Mechanism to Avoid Message Deadlock in K-Ary N-Cube Networks,” Proc. Conf. High Performance Computing (HiPC), 1997.
[4] R. Casado, A. Bermúdez, F.J. Quiles, J.L. Sánchez, and J. Duato, “Performance Evaluation of Dynamic Reconfiguration in High-Speed Local Area Networks,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture, 2000.
[5] A.A. Chien and J.H. Kim, “Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors,” J. ACM, vol. 42, no. 1, pp. 91-123, 1995.
[6] W.J. Dally and H. Aoki, “Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, 1993.
[7] P.T. Gaughan and S. Yalamanchili, “A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 5, pp. 482-497, 1995.
[8] C.J. Glass and L.M. Ni, “Fault-Tolerant Wormhole Routing in Meshes Without Virtual Channels,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, pp. 620-636, June 1996.
[9] R. Horst, “A Flexible Servernet-Based Fault-Tolerant Architecture,” Proc. 25th Int'l Symp. Fault-Tolerant Computing, 1995.
[10] Myrinet Inc., Guide to Myrinet-2000 Switches and Switch Networks, www.myri.com, 2003.
[11] InfiniBand Architecture Specification, InfiniBand Trade Assoc., 2006.
[12] L. Lamport, “Time, Clocks and the Ordering of Events in a Distributed System,” Comm. ACM, vol. 21, no. 7, pp. 558-565, 1978.
[13] D.H. Linder and J.C. Harden, “An Adaptive and Fault Tolerant Wormhole Routing Strategy for $K{\hbox{-}}{\rm{Ary}}$ $N{\hbox{-}}{\rm{Cubes}}$ ,” IEEE Trans. Computers, vol. 40, no. 1, pp. 2-12, 1991.
[14] O. Lysne and J. Duato, “Fast Dynamic Reconfiguration in Irregular Networks,” Proc. 2000 Int'l Conf. Parallel Processing, pp. 449-458, 2000.
[15] O. Lysne, J.M. Montañana, T.M. Pinkston, J. Duato, T. Skeie, and J. Flich, “Simple Deadlock-Free Dynamic Network Reconfiguration,” Proc Int'l Conf. High Performance Computing (HiPC), 2004.
[16] O. Lysne, T.M. Pinkston, and J. Duato, “A Methodology for Developing Dynamic Network Reconfiguration Processes,” Proc. 2003 Int'l Conf. Parallel Processing (ICPP '03), pp. 77-86, 2003.
[17] O. Lysne and T. Skeie, “Load Balancing of Irregular System Area Networks through Multiple Roots,” Proc. Second Int'l Conf. Comm. Computing (CIC '01), 2001.
[18] M.D. Schroeder et al., “Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links,” SRC Research Report 59, Digital Equipment Corporation, 1990.
[19] PCI-SIG, PCI-Express, http:/www.pcisig.com/, 2003.
[20] T. Pinkston, R. Pang, and J. Duato, “Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependeability,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 8, pp. 780-794, Aug. 2003.
[21] V. Puente, R. Beivide, J.A. Gregorio, J.M. Prellezo, J. Duato, and C. Izu, “Adaptive Bubble Router: A Design to Improve Performance in Torus Networks,” Proc. Int'l Conf. Parallel Processing (ICPP), 1999.
[22] J.C. Sancho and A. Robles, “Improving the Up*/Down* Routing Scheme for Networks of Workstations,” Proc. Sixth Int'l Euro-Par Conf. (Euro-Par '00), pp. 882-889, 2000.
[23] I. Theiss and O. Lysne, “LORE— Local Reconfiguration for Fault Management in Irregular Interconnects,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
[24] M.A. Weiss, Data Structures and Algorithm Analysis. Benjamin/Cummings, 1992.

Index Terms:
Fault-tolerant routing, interconnection networks, deadlock freedom, path redundancy.
Citation:
Ingebj? Theiss, Olav Lysne, "FRoots: A Fault Tolerant and Topology-Flexible Routing Technique," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 10, pp. 1136-1150, Oct. 2006, doi:10.1109/TPDS.2006.140
Usage of this product signifies your acceptance of the Terms of Use.