This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Rectilinear-Monotone Polygonal Fault Block Model for Fault-Tolerant Minimal Routing in Mesh
March 2003 (vol. 52 no. 3)
pp. 310-320
Dajin Wang, IEEE

Abstract—We propose a new fault block model, Minimal-Connected-Component (MCC), for fault-tolerant adaptive routing in mesh-connected multiprocessor systems. This model refines the widely used rectangular model by including fewer nonfaulty nodes in fault blocks. The positions of source/destination nodes relative to faulty nodes are taken into consideration when constructing fault blocks. The main idea behind it is that a node will be included in a fault block only if using it in a routing will definitely make the route nonminimal. The resulting fault blocks are of the rectilinear-monotone polygonal shapes. A sufficient and necessary condition is proposed for the existence of the minimal “Manhattan” routes in the presence of such fault blocks. Based on the condition, an algorithm is proposed to determine the existence of Manhattan routes. Since MCC is designed to facilitate minimal route finding, if there exists no minimal route under MCC fault model, then there will be absolutely no minimal route whatsoever. We will also present two adaptive routing algorithms that construct a Manhattan route avoiding all fault blocks, should such routes exist.

[1] R. Boppana and S. Chalasani, "Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks," IEEE Trans. Computers, vol. 44, no. 7, pp. 848-864, July 1995.
[2] Y.M. Boura and C.R. Das, “Fault-Tolerant Routing in Mesh Networks,” Proc. 1995 Int'l Conf. Parallel Processing, pp. I 106-I 109, 1995.
[3] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 268-277, May 1992.
[4] G.M. Chiu and S.P. Wu, "A Fault-Tolerant Routing Strategy in Hypbercube Multicomputers," IEEE Trans. Computers, vol. 45, no. 2, pp. 143-156, Feb. 1996.
[5] W.J. Dally, “The J-Machine: System Support for Actors,” Actors Knowledge-Based Concurrent Computing, C. Hewitt and G. Agha, eds., MIT Press, 1989.
[6] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Next Century Challenges: Scalable Coordination in Sensor Networks,” Proc. Mobile Computing MOBICOM, pp. 263-270, 1999.
[7] P.T. Gaughan, B.V. Dao, S. Yalamanchili, and D.E. Schimmel, "Distributed Deadlock-Free Routing in Faulty Pipelined k-Ary n-Cubes," IEEE Trans. Computers, vol. 45, no. 6, pp. 651-665, June 1996.
[8] G.J. Glass and L.M. Ni, Fault-Tolerant Wormhole Routing in Meshes without Virtual Channels IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, pp. 620-636, June 1996.
[9] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[10] A.C. Liang, S. Bhattacharya, W.T. Tsai, "Fault-Tolerant Multicast on Hypercube," J. Parallel and Distributed Computing, Vol. 23, No. 12, Dec. 1994, pp. 418-428.
[11] R. Libeskind-Hadas and E. Brandt, “Origin-Based Fault-Tolerant Routing in the Mesh,” Proc. First Int'l Symp. High Performance Computer Architecture, pp. 102-111, 1995.
[12] S.L. Lillevik,“The Touchstone 30 Gigaflop DELTA prototype,” Sixth Distributed Memory Computing Conf., pp. 671-677, 1991.
[13] C.L. Seitz et al., "The Architecture and Programming of the Ametak Series 2010," Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 33-37, Jan. 1988.
[14] C. Su and K. G. Shin, “Adaptive Fault Tolerant Deadlock-Free Routing in Meshes and Hypercubes,” IEEE Trans. Computers, vol. 45, no. 6, pp. 666–683, June 1996.
[15] Y.-J. Suh, B.V. Dao, J. Duato, and S. Yalamanchili, “Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks,” Proc. 1995 Int'l Conf. Parallel Processing, pp. I 101-I 105, 1995.
[16] J. Wu, "Unicasting in Faulty Hypercubes Using Safety Levels," IEEE Trans. Computers, vol. 46, no. 2, pp. 241-247, Feb. 1997.
[17] J. Wu, “Fault-Tolerant Adaptive and MinimalRouting in Mesh-Connected Multicomputers Using Extended Safety Levels,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 2, pp. 149-159, Feb. 2000.

Index Terms:
Adaptive routing, fault model, fault tolerance, interconnection network, mesh.
Citation:
Dajin Wang, "A Rectilinear-Monotone Polygonal Fault Block Model for Fault-Tolerant Minimal Routing in Mesh," IEEE Transactions on Computers, vol. 52, no. 3, pp. 310-320, March 2003, doi:10.1109/TC.2003.1183946
Usage of this product signifies your acceptance of the Terms of Use.