This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Distributed, Deadlock-Free Routing in Faulty, Pipelined, Direct Interconnection Networks
June 1996 (vol. 45 no. 6)
pp. 651-665

Abstract—This paper focuses on designing high performance pipelined networks that can operate in the presence of dynamic component failures. A general, rigorous framework for deadlock-free communication in faulty, pipelined networks is developed. A mechanism is also proposed for recovering from dynamic link and node failures. The recovery mechanism 1) is fully distributed, 2) does not require timeouts, 3) prevents fault-induced deadlock, and 4) is integrated into the virtual channel flow control mechanisms. This recovery mechanism is used to develop a new pipelined communication mechanism—acknowledged pipelined circuit-switching (APCS). This mechanism supports existing routing protocols [19] that can tolerate a maximal number of static link failures, i.e., one less than the number of ports on a node. An implementation of a novel router architecture is described and the results of detailed flit level simulations are presented. Finally, the proposed recovery mechanism is shown to be applicable to existing adaptive wormhole routing protocols which are prone to deadlock in the presence of dynamic faults.

[1] J.D. Allen, P.T. Gaughan, D.E. Schimmel, and S. Yalamanchili, "Ariadne—An Adaptive Router for Fault-Tolerant Multicomputers," Proc. 21st Int'l Symp. Computer Architecture, pp. 278-288, Apr. 1994.
[2] P. Berman, L. Gravano, J. Sanz, and G. Pifarre, "Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks," Proc. Fourth ACM Symp. Parallel Algorithms and Architectures, June 1992.
[3] R. Boppana and S. Chalasani, "A Comparison of Adaptive Wormhole Routing Algorithms," Proc. 20th Ann. Int'l Symp. Computer Architecture," pp. 351-360, 1993.
[4] S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb, "Supporting Systolic and Memory Communication in iWarp," Proc. 17th Int'l Symp. Computer Architecture, pp. 70-81, 1990.
[5] M.S. Chen and K.G. Shin, "Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 152-159, Apr. 1990.
[6] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 268-277, May 1992.
[7] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[8] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[9] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[10] J.T. Draper and J. Ghosh, "Multipath e-Cube Algorithms (MECA) for Adaptive Wormhole Routing and Broadcasting in k-Ary n-Cubes," Proc. Sixth Int'l Parallel Processing Symp., pp. 407-410, Mar. 1992.
[11] J. Duato, "Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Evaluation of a New Algorithm," Proc. IEEE Symp. Parallel and Distributed Processing, pp. 840-847, 1991.
[12] J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[13] J. Duato, "A Theory to Increase the Effective Redundancy in Wormhole Networks," Personal Communication, 1993.
[14] D. Ferrari, Computer Systems Performance Evaluation. Prentice Hall, 1978.
[15] P.T. Gaughan, "Design and Analysis of Fault-Tolerant Pipelined Multiprocessor Networks," technical report, PhD thesis, Georgia Inst. of Tech nology, May 1994.
[16] P.T. Gaughan and S. Yalamanchili, "Pipelined Circuit-Switching: A Fault-Tolerant Variant of Wormhole Routing," Proc. IEEE Symp. Parallel and Distributed Processing, Dec. 1992.
[17] P.T. Gaughan and S. Yalamanchili, "Analytical Models of Bandwidth Allocation in Pipelined k-Ary n-Cubes," Proc. Seventh Int'l Parallel Processing Symp., Apr. 1993.
[18] P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Directed Multiprocessor Networks," Technical Report GIT/CSRL-93/01, Georgia Inst. of Tech nology, available via anonymous ftp at ftp.eecom.gatech.edu:pub/csrl, Jan. 1993.
[19] P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 6, pp. 482-487, May 1995.
[20] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[21] C.J. Glass and L.M. Ni, "Fault-Tolerant Wormhole Routing in Meshes," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 240-249, 1993.
[22] C.R. Hesshope, P.R. Miller, and J.T. Yanchev, "High Performance Communications in Processor Networks," Proc. Int'l Symp. Computer Architecture, pp. 150-157, 1989.
[23] J.H. Kim, Z. Liu, and A.A. Chien, "Compressionless Routing: A Framework for Adaptive and Fault Tolerant Routing," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 289-300, Apr. 1994.
[24] X. Lin and L. Ni, "Deadlock-Free Multicast Wormhole Routing in Multicomputer Networks," Proc. Int'l Symp. Computer Architecture, June 1991.
[25] D.S. Reeves, E.F. Gehringer, and A. Chandiramani, "Adaptive Routing and Deadlock Recovery: A Simulation Study," Proc. Fourth Conf. Hypercube Concurrent Computing Applications, Mar. 1989.
[26] J. Samson, "The Probability of the Occurrence of Soft Errors," Personal Communication, 1994.

Index Terms:
Dynamic fault tolerance, reliable message delivery, distributed recovery mechanism, pipelined interconnection network, wormhole routing.
Citation:
Patrick T. Gaughan, Binh V. Dao, Sudhakar Yalamanchili, David E. Schimmel, "Distributed, Deadlock-Free Routing in Faulty, Pipelined, Direct Interconnection Networks," IEEE Transactions on Computers, vol. 45, no. 6, pp. 651-665, June 1996, doi:10.1109/12.506422
Usage of this product signifies your acceptance of the Terms of Use.