This Article 
 Bibliographic References 
 Add to: 
A Fault-Tolerant Routing Strategy in Hypercube Multicomputers
February 1996 (vol. 45 no. 2)
pp. 143-155

Abstract—We investigate fault-tolerant routing which aims at finding feasible minimum paths in a faulty hypercube. The concept of unsafe node and its extension are used in our scheme. A set of stringent criteria is proposed to identify the possibly bad candidates for forwarding a message. As a result, the number of such undesirable nodes is reduced without sacrificing the functionality of the mechanism. Furthermore, the notion of degree of unsafeness for classifying the unsafe nodes is introduced to facilitate the design of efficient routing algorithms which rely on having each node keep the states of its nearest neighbors. We show that a feasible path of length no more than the Hamming distance between the source and the destination plus four can always be established by the routing algorithm as long as the hypercube is not fully unsafe. The issue of deadlock freeness is also addressed in this research. More importantly, another fault-tolerant routing algorithm, which requires only a constant of five virtual networks in wormhole routing to ensure the property of deadlock freeness for a hypercube of any size, is presented in this paper.

[1] D.M. Blough and H. Wang, "Cooperative diagnosis and routing in fault-tolerant multiprocessor systems," J. Parallel and Distributed Computing, to appear.
[2] A. Borodin and J.E. Hopcroft,"Routing, merging and sorting on parallel models of comparison," J. Computer and System Science, vol. 30, pp. 130-145, 1985.
[3] M.S. Chen and K. Shin, "Message Routing in an Injured Hypercube," Proc. Third Conf. Hypercube Concurrent Computers and Their Applications, pp. 312-317, 1988.
[4] M.-S. Chen and K.G. Shin, "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Computers, vol. 39, no. 12, pp. 1,406-1,416, Dec. 1990.
[5] M.S. Chen and K.G. Shin, "Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 152-159, Apr. 1990.
[6] G.-M. Chiu, C.S. Chalsani, and C.S. Raghvendra, "Flexible Routing Criteria for Circuit-Switched Hypercubes," J. Parallel and Distributed Computing, vol. 22, no. 2, pp. 279-284, Aug. 1994.
[7] G.-M. Chiu and S.P. Wu, "A study on fault-tolerant routing in hypercube systems," Technical Report TR-1993-EE-001, NTIT, Nov. 1993.
[8] P.-J. Chuang and N.-F. Tzeng,"A Fast Recognition-Complete Processor Allocation Strategy for Hypercube Computers," IEEE Trans. Computers, pp. 467-479, Apr. 1992.
[9] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[10] A.L. DeCegama, The Technology of Parallel Processing—Parallel Processing Architectures and VLSI Hardware, vol. 1. Prentice Hall, 1989.
[11] P.J. Denning, "Parallel Computing and Its Evolution," Comm. ACM, vol. 29, pp. 1,163-1,167, Dec. 1986.
[12] J. Duato,“On the design of deadlock-free adaptive routing algorithms for multicomputers: Theoretical aspects,” Proc. Second Europe Distributed Memory Computing Conf., Apr. 1991.
[13] P.T. Gaughan and S. Yalamanchili, “Adaptive Routing Protocols for Hypercube Interconnection Networks,” Computer, vol. 26, no. 5, pp. 12–23, May 1993.
[14] J.M. Gordon and Q.F. Stout, “Hypercube Message Routing in the Presence of Faults,” Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 318-327, Jan. 1988.
[15] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[16] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[17] C. L. Seitz,“The cosmic cube,”CACM, vol. 28, pp. 22–33, Jan. 1985.
[18] S.-B. Tien and C.S. Raghavendra,“Algorithms and bounds for shortest paths and diameter in faultyhypercubes,” IEEE Trans. Parallel and Distributed Systems, pp. 713-718, June 1993.
[19] J. Wu, “Safety Levels—An Efficient Mechanism for Achieving Reliable Broadcasting in Hypercubes,” IEEE Trans. Computers, vol. 44, no. 5, pp. 702-706, May 1995.
[20] J. Wu and E.B. Fernandez, "Broadcasting in Faulty Hypercubes," Proc. 11th Symp. Reliable Distributed Systems, pp. 122-129, Oct. 1992.

Index Terms:
Deadlock, fault tolerance, hypercubes, routing, virtual channels, wormhole routing.
Ge-Ming Chiu, Shui-Pao Wu, "A Fault-Tolerant Routing Strategy in Hypercube Multicomputers," IEEE Transactions on Computers, vol. 45, no. 2, pp. 143-155, Feb. 1996, doi:10.1109/12.485379
Usage of this product signifies your acceptance of the Terms of Use.