This Article 
 Bibliographic References 
 Add to: 
Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes
June 1996 (vol. 45 no. 6)
pp. 666-683

Abstract—We present an adaptive deadlock-free routing algorithm which decomposes a given network into two virtual interconnection networks, VIN1 and VIN2. VIN1 supports deterministic deadlock-free routing, and VIN2 supports fully-adaptive routing. Whenever a channel in VIN1 or VIN2 is available, it can be used to route a message.

Each node is identified to be in one of three states: safe, unsafe, and faulty. The unsafe state is used for deadlock-free routing, and an unsafe node can still send and receive messages. When nodes become faulty/unsafe, some channels in VIN2 around the faulty/unsafe nodes are used as the detours of those channels in VIN1 passing through the faulty/unsafe nodes, i.e., the adaptability in VIN2 is transformed to support fault-tolerant deadlock-free routing. Using information on the state of each node's neighbors, we have developed an adaptive fault-tolerant deadlock-free routing scheme for n-dimensional meshes and hypercubes with only two virtual channels per physical link.

In an n-dimensional hypercube, any pattern of faulty nodes can be tolerated as long as the number of faulty nodes is no more than $\lceil\, n/2 \,\rceil$. The maximum number of faulty nodes that can be tolerated is 2n−1, which occurs when all faulty nodes can be encompassed in an (n− 1)-cube. In an n-dimensional mesh, we use a more general fault model, called a disconnected rectangular block. Any arbitrary pattern of faulty nodes can be modeled as a rectangular block after finding both unsafe and disabled nodes (which are then treated as faulty nodes). This concept can also be applied to k-ary n-cubes with four virtual channels, two in VIN1 and the other two in VIN2. Finally, we present simulation results for both hypercubes and 2-dimensional meshes by using various workloads and fault patterns.

[1] M.-S. Chen and K.G. Shin, "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Computers, vol. 39, no. 12, pp. 1,406-1,416, Dec. 1990.
[2] M.S. Chen and K.G. Shin, "Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 152-159, Apr. 1990.
[3] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 268-277, May 1992.
[4] W. Chou, A.W. Bragg, and A.A. Nilsson, "The Need for Adaptive Routing in the Chaotic and Unbalanced Traffic Environment," IEEE Trans. Commun., vol. COM-29, no. 4, pp. 481-490, Apr 1981.
[5] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[6] W.J. Dally, "The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms," IEEE Micro, pp. 23-39, Apr. 1992.
[7] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[8] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[9] J. Duato, “On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Design Methodologies,” Proc. Parallel Architectures and Languages Europe 91, June 1991.
[10] J. Duato,“On the design of deadlock-free adaptive routing algorithms for multicomputers: Theoretical aspects,” Proc. Second Europe Distributed Memory Computing Conf., Apr. 1991.
[11] J. Duato, "Improving the Efficiency of Virtual Channels with Time-Dependent Selection Functions," Proc. Parallel Architectures and Languages Europe, pp. 635-650, June 1992.
[12] J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[13] J. Duato, "A Theory to Increase the Effective Redundancy in Wormhole Networks," Parallel Processing Letters, vol. 4, nos. 1&2, pp. 125-138, 1994.
[14] P.T. Gaughan and S. Yalamanchili, "Pipelined Circuit-Switching: A Fault-Tolerant Variant of Wormhole Routing," Proc. IEEE Symp. Parallel and Distributed Processing, Dec. 1992.
[15] C. Glass and L. Ni, "Maximally Fully Adaptive Routing in 2D Meshes," Proc. 1992 Int'l Conf. Parallel Processing, pp. I101-104, Aug. 1992.
[16] C.J. Glass and L.M. Ni, “Adaptive Routing in Mesh-Connected Networks,” Proc. 1992 Int'l Conf. on Distributed Computing Systems, pp. 12-19, May 1992.
[17] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[18] C.J. Glass and L.M. Ni, "Fault-Tolerant Wormhole Routing in Meshes," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 240-249, 1993.
[19] I.S. Gopal, "Prevention of Store-and-Forward Deadlock in Computer Networks," IEEE Trans. Commun., vol. 33, no. 12, pp. 1,258-1,264, Dec. 1985.
[20] P. Kermani and L. Kleinrock, "Virtual Cut-Through: A New Computer Communication Switching Technique," Computer Networks, vol. 3, pp. 267-286, Sept. 1979.
[21] J. Kim and K.G. Shin, "Deadlock-Free Fault-Tolerant Routing in Injured Hypercubes," IEEE Trans. Computers, vol. 42, no. 9, pp. 1,078-1,088, Sept. 1993.
[22] T.C. Lee and J.P. Hayes, "Routing and Broadcasting in Faulty Hypercube Computers," Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 625-630, 1988.
[23] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[24] X. Lin, P.K. McKinley, and L.M. Li, "The Message Flow Model for Routing in Wormhole-Routed Networks," Proc. Int'l Conf. Parallel Processing, vol. 1, pp. I294-I297, Aug. 1993.
[25] D.H. Linder and J.C. Harden, "An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes," IEEE Trans. Computers, vol. 40, no. 1, pp. 2-12, Jan. 1991.
[26] T. Nguyen and L. Snyder, "Performance of Minimal Adaptive Routers," Proc. Parallel Computer Routing and Communication Workshop, May 1994.
[27] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[28] Multicomputer Network: Message-Based Parallel Processing, D. Reed and R. Fujimoto, eds. Cambridge, Mass.: MIT Press, 1987.
[29] J. Sutton, P. Wiley, and C. Peterson, "iWarp: A 100-MOPS, LIW Microprocessor for Multicomputers," IEEE Micro, pp. 26-29, June 1991.
[30] D. Talia, "Message-Routing Systems for Transputer-Based Multicomputers," IEEE Micro, pp. 62-72, June 1993.
[31] P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 6, pp. 482-487, May 1995.
[32] W.D. Hillis, The Connection Machine, MIT Press, Cambridge, Mass., 1985.
[33] W.J. Dally,“Fine-grain message passing concurrent computers,” Proc. Third Conf. Hypercube Concurrent Computers, vol. 1, pp. 2-12, Jan. 1988.
[34] X. Zhang, "Systems of Interprocessor Communication Latency in Multicomputers," IEEE Micro, pp. 12-15 and 52-55, Apr. 1991.

Index Terms:
Wormhole routing, adaptive deadlock-free routing, fault-tolerant routing, hypercubes and meshes, k-ary n-cubes.
Chien-Chun Su, Kang G. Shin, "Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes," IEEE Transactions on Computers, vol. 45, no. 6, pp. 666-683, June 1996, doi:10.1109/12.506423
Usage of this product signifies your acceptance of the Terms of Use.