This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Theory of Fault-Tolerant Routing in Wormhole Networks
August 1997 (vol. 8 no. 8)
pp. 790-802

Abstract—Fault-tolerant systems aim at providing continuous operation in the presence of faults. Multicomputers rely on an interconnection network between processors to support the message-passing mechanism. Therefore, the reliability of the interconnection network is very important for the reliability of the whole system.

This paper analyzes the effective redundancy available in a wormhole network by combining connectivity and deadlock freedom. Redundancy is defined at the channel level. We propose a sufficient condition for channel redundancy, also computing the set of redundant channels. The redundancy level of the network is also defined, proposing a theorem that supplies its value. This theory is developed on top of our necessary and sufficient condition for deadlock-free adaptive routing. The new theory also considers the failure of physical channels when virtual channels are used. Finally, we propose a methodology for the design of fault-tolerant routing algorithms, showing its application to n-dimensional meshes.

[1] A. Agarwal, "Limits on Interconnection Network Performance," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 398-412, Oct. 1991.
[2] M.S. Alam and R.G. Melhem, "How to Use an Incomplete Binary Hypercube for Fault Tolerance," Hypercube and Distributed Computers, F. Andréand J.P. Verjus, eds., pp. 329-341. North-Holland, 1989.
[3] J.D. Allen, P.T. Gaughan, D.E. Schimmel, and S. Yalamanchili, "Ariadne—An Adaptive Router for Fault-Tolerant Multicomputers," Proc. 21st Int'l Symp. Computer Architecture, pp. 278-288, Apr. 1994.
[4] W.C. Athas and C.L. Seitz, “Multicomputers: Message-Passing Concurrent Computers,” Computer, vol. 21, pp. 9-24, Aug. 1988.
[5] K. Bolding and W. Yost, "Design of a Router for Fault-Tolerant Networks," Proc. 1994 Parallel Computer Routing and Comm. Workshop, pp. 226-240, May 1994.
[6] R. Boppana and S. Chalasani, "A Comparison of Adaptive Wormhole Routing Algorithms," Proc. 20th Ann. Int'l Symp. Computer Architecture," pp. 351-360, 1993.
[7] R. Boppana and S. Chalasani, “Fault-Tolerant Routing with Non-Adaptive Wormhole Algorithms in Mesh Networks,” Proc. Supercomputing, pp. 693-702, 1994.
[8] R. Boppana and S. Chalasani, "Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks," IEEE Trans. Computers, vol. 44, no. 7, pp. 848-864, July 1995.
[9] S. Chalasani and R.V. Boppana,“Fault-tolerant wormhole routing in tori,” Proc. Eighth ACM Int’l Conf. Supercomputing, July 1994.
[10] M.S. Chen and K.G. Shin, "Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 152-159, Apr. 1990.
[11] M.-S. Chen and K.G. Shin, "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Computers, vol. 39, no. 12, pp. 1,406-1,416, Dec. 1990.
[12] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," J. ACM, vol. 42, no. 1, pp. 91-123, 1995.
[13] G. Chiu and S. Wu, "Fault-Tolerant Routing Strategy in Hypercube Systems," Proc. 24th Int'l Symp. Fault-Tolerant Computing, pp. 382-391, 1994.
[14] R. Cypher and L. Gravano, "Requirements for Deadlock-Free, Adaptive Packet Routing," Proc. 11th ACM Symp. Principles Distributed Computing, 1992.
[15] W.J. Dally and C.L. Seitz, "The Torus Routing Chip," Distributed Computing, vol. 1, no. 3, pp. 187-196, Oct. 1986.
[16] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[17] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[18] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[19] W.J. Dally, L.R. Dennison, D. Harris, K. Kan, and T. Xanthopoulus, “The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers,” Proc. Workshop Parallel Computer Routing and Comm., pp. 241–255, May 1994.
[20] B.V. Dao, J. Duato, and S. Yalamanchili, ”Configurable Flow Control Mechanisms for Fault-Tolerant Routing,” Proc. Int'l Symp. Computer Architecture, June 1995.
[21] J. Duato, “On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Design Methodologies,” Proc. Parallel Architectures and Languages Europe 91, June 1991.
[22] J. Duato, "A Theory to Increase the Effective Redundancy in Wormhole Networks," Proc. Int'l Conf. Decentralized Distributed Systems, Sept. 1993.
[23] J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[24] J. Duato, "A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks," Proc. 1994 Int'l Conf. Parallel Processing, Aug. 1994.
[25] J. Duato, "A Theory to Increase the Effective Redundancy in Wormhole Networks," Parallel Processing Letters, vol. 4, nos. 1/2, pp. 125-138, 1994.
[26] J. Duato and L.M. Ni, “A Theory of Fault-Tolerant Routing in Wormhole Networks,” Proc. Int'l Conf. Parallel and Distributed Systems, pp. 600-607, Dec. 1994.
[27] J. Duato, B. Dao, P. Gaughan, and S. Yalamanchili, "Scouting: Fully Adaptive, Deadlock-Free Routing in Faulty Pipelined Networks," Proc. 1994 Int'l Conf. Parallel and Distributed Systems, pp. 608-613, Dec. 1994.
[28] J. Duato, “A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, pp. 1,055–1,067, Oct. 1995.
[29] J. Duato, “A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 8, pp. 841-854, Aug. 1996.
[30] A.H. Esfahanian and S.L. Hakimi, “Fault-Tolerant Routing in de Bruijn Communication Networks,” IEEE Trans. Computers, vol. 34, no. 9, pp. 777-788, Sept. 1985.
[31] P. Fraigniaud, "Fault-tolerant gossiping on hypercube multicomputers," Proc. EDMCC2, pp. 463-472, Munich, 1991.
[32] P.T. Gaughan and S. Yalamanchili, "Pipelined Circuit-Switching: A Fault-Tolerant Variant of Wormhole Routing," Proc. IEEE Symp. Parallel and Distributed Processing, Dec. 1992.
[33] P.T. Gaughan, B.V. Dao, S. Yalamanchili, and D.E. Schimmel, "Distributed, Deadlock-Free Routing in Faulty, Pipelinedk-aryn-cubes," Technical Report GIT-CSRL/93-11, Georgia Inst. of Tech nology, Nov. 1993.
[34] P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 6, pp. 482-487, May 1995.
[35] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[36] C.J. Glass and L.M. Ni, "Fault-Tolerant Wormhole Routing in Meshes," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 240-249, 1993.
[37] R.L. Hadas and E. Brandt, “Origin-Based Fault-Tolerant Routing in the Mesh”, Proc. First IEEE Symp. High Performance Computer Architecture, 1995.
[38] J.H. Kim, Z. Liu, and A.A. Chien, "Compressionless Routing: A Framework for Adaptive and Fault Tolerant Routing," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 289-300, Apr. 1994.
[39] F. Ludolph, Y. Chow, D. Ingalls, S. Wallace, and K. Doyle, “The Fabrik Programming Environment,” Proc. IEEE Workshop on Visual Languages, pp. 222–230, Oct. 1988.
[40] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[41] D.H. Linder and J.C. Harden, "An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes," IEEE Trans. Computers, vol. 40, no. 1, pp. 2-12, Jan. 1991.
[42] D. Pradhan, "Dynamically Restructurable Fault-Tolerant Processor Network Architectures," IEEE Trans. Computers, vol. 34, no. 5, pp. 434-447, May 1985.
[43] C.S. Raghavendra,P.-J. Yang,, and S.-B. Tien,“Free dimensions—an effective approach to achieving fault tolerance in hypercubes,” 22nd Ann. Int’l Symp. Fault-Tolerant Computing, pp. 170-177, 1992.
[44] P. Ramanathan and K.G. Shin, "Reliable Broadcast in Hypercube Multicomputers," IEEE Trans. Computers, vol. 37, no. 12, pp. 1,654-1,657, Dec. 1988.
[45] M.A. Sridhar and C.S. Raghavendra, "Fault-Tolerant Networks Based on the de Bruijn Graph," IEEE Trans. Computers, vol. 40, no. 10, pp. 1,167-1,174, Oct. 1991.
[46] Y.-J. Suh, B.V. Dao, J. Duato, and S. Yalamanchili, "Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks," Proc. 1995 Int'l Conf. Parallel Processing, Aug. 1995.

Index Terms:
Adaptive routing, channel redundancy, fault-tolerant routing, interconnection networks, network redundancy, wormhole switching.
Citation:
José Duato, "A Theory of Fault-Tolerant Routing in Wormhole Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 8, pp. 790-802, Aug. 1997, doi:10.1109/71.605766
Usage of this product signifies your acceptance of the Terms of Use.