This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Dynamically Configurable Message Flow Control for Fault-Tolerant Routing
January 1999 (vol. 10 no. 1)
pp. 7-22

Abstract—Fault-tolerant routing protocols in modern interconnection networks rely heavily on the network flow control mechanisms used. Optimistic flow control mechanisms, such as wormhole switching (WS), realize very good performance, but are prone to deadlock in the presence of faults. Conservative flow control mechanisms, such as pipelined circuit switching (PCS), ensure the existence of a path to the destination prior to message transmission, achieving reliable transmission at the expense of performance. This paper proposes a general class of flow control mechanisms that can be dynamically configured to trade-off reliability and performance. Routing protocols can then be designed such that, in the vicinity of faults, protocols use a more conservative flow control mechanism, while the majority of messages that traverse fault-free portions of the network utilize a WS like flow control to maximize performance. We refer to such protocols as two-phase protocols. This ability provides new avenues for optimizing message passing performance in the presence of faults. A fully adaptive two-phase protocol is proposed, and compared via simulation to those based on WS and PCS. The architecture of a network router supporting configurable flow control is also described.

[1] J.D. Allen, P.T. Gaughan, D.E. Schimmel, and S. Yalamanchili, "Ariadne—An Adaptive Router for Fault-Tolerant Multicomputers," Proc. 21st Int'l Symp. Computer Architecture, pp. 278-288, Apr. 1994.
[2] K.V. Anjan and T.M. Pinkston, "DISHA: An Efficient Fully Adaptive Deadlock Recovery Scheme," Proc. Ninth Int'l Parallel Processing Symp., Apr. 1995.
[3] R. Boppana and S. Chalasani, "A Comparison of Adaptive Wormhole Routing Algorithms," Proc. 20th Ann. Int'l Symp. Computer Architecture," pp. 351-360, 1993.
[4] S. Chalasani and R.V. Boppana,“Fault-tolerant wormhole routing in tori,” Proc. Eighth ACM Int’l Conf. Supercomputing, July 1994.
[5] M.S. Chen and K.G. Shin, "Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, Apr. 1990.
[6] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," J. ACM, vol. 42, no. 1, pp. 91-123, 1995.
[7] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[8] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[9] W.J. Dally, L.R. Dennison, D. Harris, K. Kan, and T. Xanthopoulus, “The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers,” Proc. Workshop Parallel Computer Routing and Comm., pp. 241–255, May 1994.
[10] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[11] L.R. Dennison, W.S. Lee, and W.J. Dally, "High Performance Bidirectional Signaling in VLSI Systems," Proc. 1993 Symp. Research on Integrated Systems, 1993.
[12] J. Duato, "A Theory of Fault-Tolerant Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 790-802, Aug. 1997.
[13] J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[14] J. Duato, B. Dao, P. Gaughan, and S. Yalamanchili, "Scouting: Fully Adaptive, Deadlock-Free Routing in Faulty Pipelined Networks," Proc. 1994 Int'l Conf. Parallel and Distributed Systems, pp. 608-613, Dec. 1994.
[15] D. Ferrari, Computer Systems Performance Evaluation. Prentice Hall, 1978.
[16] M. Galles, "Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SPIDER Chip," Proc. Hot Interconnects Symp. IV, Aug. 1996.
[17] P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 6, pp. 482-487, May 1995.
[18] P.T. Gaughan, B.V. Dao, S. Yalamanchili, and D.E. Schimmel, "Distributed Deadlock-Free Routing in Faulty Pipelined k-Ary n-Cubes," IEEE Trans. Computers, vol. 45, no. 6, pp. 651-665, June 1996.
[19] P.T. Gaughan, B.V. Dao, and S. Yalamanchili, "The Effects of Faults in Multiprocessor Networks: A Trace-Driven Study," Proc. Workshop Architectures for Real-Time Applications, May 1994.
[20] P.T. Gaughan and S. Yalamanchili, “Adaptive Routing Protocols for Hypercube Interconnection Networks,” Computer, vol. 26, no. 5, pp. 12–23, May 1993.
[21] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[22] C.J. Glass and L.M. Ni, "Fault-Tolerant Wormhole Routing in Meshes," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 240-249, 1993.
[23] R.E. Kessler and J.L. Schwarzmeier, "CRAY T3D: A New Dimension for Cray Research," Proc. COMPCON, pp. 176-182, Feb. 1993.
[24] J.H. Kim, Z. Liu, and A.A. Chien., "Compressionless Routing: A Framework for Fault-Tolerant Routing," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 3, pp. 229-244, Mar. 1997.
[25] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[26] M.D. Noakes, D.A. Wallach, and W.J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 224-235, May 1993.
[27] S.L. Scott, "Synchronization and Communication in the T3E Multiprocess," Proc. ASPLOS-VII, Oct. 1996.
[28] S.L. Scott and G. Thorson, "The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus," Proc. Hot Interconnects Symp. IV, Aug. 1996.
[29] A. Sivasubramaniam, A. Singla, U. Ramachandram, and H. Venkateswaran, "Machine Abstractions and Locality Issues in Studying Parallel Systems," Technical Report GITCC93/63, Georgia Inst. of Tech nology, Oct. 1993.
[30] Y.J. Suh, B.V. Dao, J. Duato, and S. Yalamanchili, "Software Based Fault Tolerant Oblivious Routing in Pipelined Networks," Proc. 1995 Int'l Conf. Parallel Processing, vol. 1, pp. 101-105, Aug. 1995.

Index Terms:
Fault-tolerant routing, multiphase routing, routing protocol, pipelined interconnection network, message flow control, wormhole switching, virtual channels, multicomputer.
Citation:
Binh Vien Dao, Jose Duato, Sudhakar Yalamanchili, "Dynamically Configurable Message Flow Control for Fault-Tolerant Routing," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 1, pp. 7-22, Jan. 1999, doi:10.1109/71.744829
Usage of this product signifies your acceptance of the Terms of Use.