This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors
April 1998 (vol. 9 no. 4)
pp. 321-334

Abstract—Reliable communication in cube-based multicomputers using the safety vector concept is studied in this paper. In our approach, each node in a cube-based multicomputer of dimension n is associated with a safety vector of n bits, which is an approximated measure of the number and distribution of faults in the neighborhood. The safety vector of each node can be easily calculated through n$-$ 1 rounds of information exchange among neighboring nodes. Optimal unicasting between two nodes is guaranteed if the kth bit of the safety vector of the source node is one, where k is the Hamming distance between the source and destination nodes. The concept of dynamic adaptivity is introduced, representing the ability of a routing algorithm to dynamically adjust its routing adaptivity based on fault distribution in the neighborhood. The feasibility of the proposed unicasting can be easily determined at the source node by comparing its safety vector with the Hamming distance between the source and destination nodes. The proposed unicasting can also be used in disconnected hypercubes, where nodes in a hypercube are disjointed (into two or more parts). We then extend the safety vector concept to general cube-based multicomputers.

[1] MPI: A Message-Passing Interface Standard. Message Passing Interface Forum, May 1994.
[2] L.N. Bhuyan and D.P. Agrawal, "Generalized Hypercube and Hyberbus Structures for a Computer Network," IEEE Trans. Computers, vol. 32, no. 4, pp. 323-333, Apr. 1984.
[3] Y.M. Boura and C.R. Das, "Fault-Tolerant Routing in Mesh Networks," Proc. 1995 Int'l Conf. Parallel Processing, pp. I 106-I 109, 1995.
[4] M.-S. Chen and K.G. Shin, "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Computers, vol. 39, no. 12, pp. 1,406-1,416, Dec. 1990.
[5] M.S. Chen and K.G. Shin, "Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 152-159, Apr. 1990.
[6] G.M. Chiu and S.P. Wu, "A Fault-Tolerant Routing Strategy in Hypbercube Multicomputers," IEEE Trans. Computers, vol. 45, no. 2, pp. 143-156, Feb. 1996.
[7] NCUBE 6400 Processor Manual. NCUBE Company, 1990.
[8] P.T. Gaughan and S. Yalamanchili, “Adaptive Routing Protocols for Hypercube Interconnection Networks,” Computer, vol. 26, no. 5, pp. 12–23, May 1993.
[9] J.M. Gordon and Q.F. Stout, “Hypercube Message Routing in the Presence of Faults,” Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 318-327, Jan. 1988.
[10] Y. Lan, "A Fault-Tolerant Routing Algorithm in Hypercubes," Proc. 1994 Int'l Conf. Parallel Processing, pp. II 163-II 166, Aug. 1994.
[11] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[12] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[13] P.K. McKinley, Y.-J. Tsai, and D. Robinson, "Collective Communication in Wormhole-routed Massively Parallel Computers," Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[14] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[15] F.P. Preparata and J. Vuillemin, “The Cube-Connected Cycles: A Versatile Network for Parallel Computation,” Comm ACM, vol. 24, no. 5, pp. 300-309, 1981.
[16] C.S. Raghavendra,P.-J. Yang,, and S.-B. Tien,“Free dimensions—an effective approach to achieving fault tolerance in hypercubes,” 22nd Ann. Int’l Symp. Fault-Tolerant Computing, pp. 170-177, 1992.
[17] J. Rattler, "Concurrent Processing: A New Direction in Scientific Computing," Proc. AFIPS Conf., vol. 54, pp. 157-166, 1985.
[18] Y. Saad and M.H. Schultz, "Data Communication in Hypercubes," Technical Report YALEU/DCS/RR-428, Dept. of Computer Science, Yale Univ., June 1985.
[19] H. Sullivan and T.R. Bashkow, "A Large Scale, Homogenous, Fully Distributed Parallel Machine," Proc. Fourth Ann. Int'l Symp. Computer Architecture, pp. 105-117, May 1977.
[20] L. Valiant, "A Scheme for Fast Parallel Communication," SIAM J. Computing, vol. 34, no. 1, pp. 350-361, May 1982.
[21] J. Wu, "Reliable Communication in Cube-Based Multicomputers Using Safety Vectors," Technical Report TR-CSE-95-24, Dept. of Computer Science and Eng., Florida Atlantic Univ., Apr. 1995.
[22] J. Wu, "Unicasting in Faulty Hypercubes Using Safety Levels," IEEE Trans. Computers, vol. 46, no. 2, pp. 241-247, Feb. 1997.
[23] J. Wu and E.B. Fernandez, "Broadcasting in Faulty Hypercubes," Proc. 11th Symp. Reliable Distributed Systems, pp. 122-129, Oct. 1992.
[24] J. Wu and K. Yao, "A Limited-Global-Information-Based Multicasting Scheme for Faulty Hypercubes," IEEE Trans. Computers, vol. 44, no. 9, pp. 1,162-1,167, Sept. 1995.

Index Terms:
Disconnected networks, fault tolerance, generalized hypercubes, multicomputers, reliable communication, unicast.
Citation:
Ji Wu, "Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 4, pp. 321-334, April 1998, doi:10.1109/71.667894
Usage of this product signifies your acceptance of the Terms of Use.