This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model
September 2003 (vol. 52 no. 9)
pp. 1154-1169
Jie Wu, IEEE

Abstract—We propose a deterministic fault-tolerant and deadlock-free routing protocol in two-dimensional (2D) meshes based on dimension-order routing and the odd-even turn model. The proposed protocol, called extended X-Y routing, does not use any virtual channels by prohibiting certain locations of faults and destinations. Faults are contained in a set of disjointed rectangular regions called faulty blocks. The number of faults to be tolerated is unbounded as long as nodes outside faulty blocks are connected in the 2D mesh network. The extended X-Y routing can also be used under a special convex fault region called an orthogonal faulty block, which can be derived from a given faulty block by activating some nonfaulty nodes in the block. Extensions to partially adaptive routing, traffic and adaptivity-balanced using virtual networks, and routing without constraints using virtual channels and virtual networks are also discussed.

[1] J. Laudon and D. Lenoski, “The SGI Origin: A CC-NUMA Highly Scalable Server,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA '97), May 1997.
[2] InfiniBand Trade Assoc., Infiniband Architecture, Specification Volume 1, Release 1.0 http:/www.infinibandta.com, 2003.
[3] R. Boppana and S. Chalasani, "Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks," IEEE Trans. Computers, vol. 44, no. 7, pp. 848-864, July 1995.
[4] Y.M. Boura and C.R. Das, Fault-Tolerant Routing in Mesh Networks Proc. 1995 Int'l Conf. Parallel Processing, pp. I 106-I 109, Aug. 1995.
[5] S. Chalasani and R. Boppana, “Communication in Multicomputers with Nonconvex Faults,” IEEE Trans. Computers, vol. 46, no. 5, pp. 616–622, May 1997.
[6] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," J. ACM, vol. 42, no. 1, pp. 91-123, 1995.
[7] G.M. Chiu, The Odd-Even Turn Model for Adaptive Routing IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 7, pp. 729-737, July 2000.
[8] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[9] J. Duato, “A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, pp. 1,055–1,067, Oct. 1995.
[10] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Next Century Challenges: Scalable Coordination in Sensor Networks,” Proc. Mobile Computing MOBICOM, pp. 263-270, 1999.
[11] F. Allen et al., A Version for Protein Science Using Petaflop Supercomputer IBM Systems J., vol. 40, pp. 310-327, 2001.
[12] E. Fleury and P. Fraigniaud, “A General Theory for Deadlock Avoidance in Wormhole-Routed Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 7, pp. 626-638, July 1998.
[13] G.J. Glass and L.M. Ni, Fault-Tolerant Wormhole Routing in Meshes without Virtual Channels IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, pp. 620-636, June 1996.
[14] C.J. Glass and L.M. Ni, “The Turn Model for Adaptive Routing,” J. ACM, vol. 41, no. 5, pp. 874-902, Sept. 1994.
[15] C.T. Ho and L. Stockmeyer, A New Approach to Fault-Tolerant Wormhole Routing for Mesh-Connected Parallel Computers Proc. 16th IEEE Int'l Parallel and Distributed Processing Symp., 2002.
[16] S.P. Kim and T. Han, Fault-Tolerant Wormhole Routing in Meshes with Overlapped Solid Fault Regions Parallel Computing, vol. 23, pp. 1937-1962, 1997.
[17] R. Hadas, K. Watkins, and T. Hehre, “Fault-Tolerant Multicast Routing in the Mesh with No Virtual Channels,” Proc. 1996 Int'l Symp. High-Performance Computer Architecture, pp. 180–190, 1996.
[18] D.H. Linder and J.C. Harden, "An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes," IEEE Trans. Computers, vol. 40, no. 1, pp. 2-12, Jan. 1991.
[19] S.S. Mukherjee et al., "The Alpha 21364 Network Architecture," IEEE Micro, vol. 22, No. 1, Jan.-Feb. 2002, pp. 26-35.
[20] H. Park and D.P. Agrawal, “Generic Methodologies for Deadlock-Free Routing,” Proc. Int'l Parallel Processing Symp., pp. 638-643, Apr. 1996.
[21] J.D. Shih, Adaptive Fault-Tolerant Wormhole Routing Algorithms for Hypercube and Mesh Interconnection Networks Proc. 11th Int'l Parallel Processing Symp., pp. 333-340, Apr. 1997.
[22] Y.J. Suh, B.V. Dao, J. Duato, and S. Yalamanchili, Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks Proc. 1995 Int'l Conf. Parallel Processing, pp. I 101-I 105, Aug. 1995.
[23] P.-H. Sui and S.-D. Wang, “An Improved Algorithm for Fault-Tolerant Wormhole Routing in Meshes,” IEEE Trans. Computers, vol. 46, no. 9, pp. 1040–1042, Sept. 1997.
[24] D. Wang, Minimal-Connected-Component (MCC) A Refined Fault Block Model for Fault-Tolerant Minimal Routing in Mesh Proc. IASTED Int'l Conf. Parallel and Distributed Computing and Systems, pp. 95-100, Nov. 1999.
[25] J. Wu, A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes without Virtual Channels Technical Report TR-CSE-00-26, Florida Atlantic Univ., Nov. 2000.
[26] J. Wu, A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers J. Parallel and Distributed Computing, vol. 62, pp. 1168-1185, 2002.
[27] J. Wu, "Unicasting in Faulty Hypercubes Using Safety Levels," IEEE Trans. Computers, vol. 46, no. 2, pp. 241-247, Feb. 1997.
[28] J. Zhou and F. Lau, Adaptive Fault-Tolerant Wormhole Routing in 2D Meshes Proc. 15th Int'l Parallel and Distributed Processing Symp. (IPDPS 2001), p. 56, 2001.

Index Terms:
Deadlock-free routing, deterministic routing, fault models, fault tolerance, turn models, virtual channels.
Citation:
Jie Wu, "A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model," IEEE Transactions on Computers, vol. 52, no. 9, pp. 1154-1169, Sept. 2003, doi:10.1109/TC.2003.1228511
Usage of this product signifies your acceptance of the Terms of Use.