This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Testbed for Evaluation of Fault-Tolerant Routing in Multiprocessor Interconnection Networks
October 1999 (vol. 10 no. 10)
pp. 1052-1066

Abstract—This paper presents a comprehensive evaluation testbed for interconnection networks and routing algorithms using real applications. The testbed is flexible enough to implement any network topology and fault-tolerant routing algorithm, and allows the system architect to study the cost versus performance trade-offs for a range of network parameters. We illustrate its use with one fault-tolerant algorithm and analyze the performance of four shared memory applications with different fault conditions. We also show how the testbed can be used to drive future research in fault-tolerant routing algorithms and architectures by proposing and evaluating novel architectural enhancements to the network router, called path selection heuristics (PSH). We propose three such schemes and the Least Recently Used (LRU) PSH is shown to give the best performance in the presence of faults.

[1] J. Laudon and D. Lenoski, "The SGI Origin: A cc-NUMA Highly Scalable Server," Proc. 24th Ann. Int'l Symp. Computer Architecture, May 1997.
[2] D. Teodosiu, J. Baxter, K. Govil, J. Chapin, M. Rosenblum, and M. Horowitz, “Hardware Fault Containment in Scalable Shared-Memory Multiprocessors,” Proc. Int'l Symp. Computer Architecture, pp. 73–84, June 1997.
[3] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[4] M. Galles, “Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SGI SPIDER Chip,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 141–146, Aug. 1996.
[5] S.L. Scott and G.M. Thorson, “The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 147–156, Aug. 1996.
[6] J. Carbonaro, F. Verhoorn, “Cavallino: The Teraflops Router and NIC,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 157–160, Aug. 1996.
[7] D. Garcia, “ServerNet II,” Proc. 1997 Parallel Computing, Routing, and Comm. Workshop, June 1997.
[8] A.G. Nowatzyk, M.C. Browne, E.J. Kelly, and M. Parkin, "S-Connect: From Networks of Workstations to Supercomputer Performance," Proc. 22nd Int'l Symp. Computer Architecture, June 1995.
[9] D.H. Linder and J.C. Harden, "An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes," IEEE Trans. Computers, vol. 40, no. 1, pp. 2-12, Jan. 1991.
[10] T.C. Lee and J.P. Hayes, “A Fault-Tolerant Communication Scheme for Hypercube Computers,” IEEE Trans. Computers, vol. 41, no. 6, pp. 725–737, June 1992.
[11] C.J. Glass and L.M. Ni, "Fault-Tolerant Wormhole Routing in Meshes," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 240-249, 1993.
[12] R. Boppana and S. Chalasani, "Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks," IEEE Trans. Computers, vol. 44, no. 7, pp. 848-864, July 1995.
[13] Y.M. Boura and C.R. Das, “Fault-Tolerant Routing in Mesh Networks,” Proc. Int'l Conf. Parallel Processing vol. 1, pp. 106–117, Aug. 1995.
[14] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," J. ACM, vol. 42, no. 1, pp. 91-123, 1995.
[15] J. Duato, "A Theory of Fault-Tolerant Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 790-802, Aug. 1997.
[16] M.-S. Chen and K.G. Shin, "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Computers, vol. 39, no. 12, pp. 1,406-1,416, Dec. 1990.
[17] P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 6, pp. 482-487, May 1995.
[18] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[19] W.J. Dally, L.R. Dennison, D. Harris, K. Kan, and T. Xanthopoulos, “Architecture and implementation of the Reliable Router,” Proc. Hot Interconnects II, Aug. 1994.
[20] A. Sivasubramaniam, A. Singla, U. Ramachandran, and H. Venkateswaran, "An Approach to Scalability Study of Shared Memory Parallel Systems," Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, ACM, May 1994.
[21] A. Vaidya, A. Sivasubramaniam, and C. Das, “Performance Benefits of Virtual Channels and Adaptive Routing: An Application-Driven Study,” Proc. 11th Int'l Conf. Supercomputing, July 1997.
[22] A. Kumar and L.N. Bhuyan, “Evaluating Virtual Channels for Cache Coherent Shared Memory Multiprocessors,” ACM Int'l Conf. Supercomputing, May 1996.
[23] D. Dai and D.K. Panda, “How Much Does Network Contention Affect Distributed Shared Memory Performance?,” Proc. Int'l Conf. Parallel Processing, pp. 454-461, Chicago, Aug. 1997.
[24] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[25] J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[26] Y.M. Boura and C.R. Das, “Efficient Fully Adaptive Wormhole Routing inn-Dimensional Meshes,” Proc. 14th Int'l Conf. Distributed Computing Systems, pp. 589-596, May 1994.
[27] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[28] A.S. Vaidya, A. Sivasubramaniam, and C.R. Das, “The PROUD Pipelined Router Architectures for High Performance Networks,” Technical Report CSE–97–007, Dept. of Computer Science and Eng., Pennsylvania State Univ., University Park, 1997.
[29] A.S. Vaidya, A. Sivasubramaniam, and C.R. Das, “LAPSES: A Recipe for High-Performance Adaptive Router Design,” Proc. Int'l Symp. High-Performance Computer Architecture, pp. 236-243, Jan. 1999.
[30] S. Konstantinidou and L. Snyder, "The Chaos Router," IEEE Trans. Computers, vol. 43, no. 12, pp. 1,386-1,397, Dec. 1994.
[31] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[32] E.A. Brewer, C.N. Dellarocas, A. Colbrook, and W.E. Weihl, "PROTEUS: A High-Performance Parallel Architecture Simulator," technical report, Massachusetts Inst. of Tech nology, Sept. 1992.
[33] H. Davis, S.R. Goldschmidt, and J.L. Hennessy, “Multiprocessor Simulation and Tracing Using Tango,” Proc. 1991 Int'l Conf. Parallel Processing, pp. 99–107, 1991.

Index Terms:
Application-driven evaluation, evaluation testbed, fault-tolerant routing, interconnection network, path-selection heuristics, router design.
Citation:
Aniruddha S. Vaidya, Chita R. Das, Anand Sivasubramaniam, "A Testbed for Evaluation of Fault-Tolerant Routing in Multiprocessor Interconnection Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 10, pp. 1052-1066, Oct. 1999, doi:10.1109/71.808150
Usage of this product signifies your acceptance of the Terms of Use.