This Article 
 Bibliographic References 
 Add to: 
A Fault-Tolerant Communication Scheme for Hypercube Computers
October 1992 (vol. 41 no. 10)
pp. 1242-1256

A fault-tolerant communication scheme that facilitates near-optimal routing and broadcasting in hypercube computers subject to node failures is described. The concept of an unsafe node is introduced to identify fault-free nodes that may cause communication difficulties. It is shown that by only using 'feasible' paths that try to avoid unsafe nodes, routing and broadcasting can be substantially simplified. A computationally efficient routing algorithm that uses local information is presented. It can route a message via a path of length no greater than p+2, where p is the minimum distance from the source to the destination, provided that not all nonfaulty nodes in the hypercube are unsafe. Broadcasting can be achieved under the same fault conditions with only one more time unit than the fault-free case. The problems posed by deadlock in faulty hypercubes are discussed, and deadlock-free implementations of the proposed communication schemes are presented.

[1] Y. Saad and M. H. Schultz, "Data communication in hypercubes," Dep. Comput. Sci., Yale Univ. Res. Rep. 428/85., 1985.
[2] H. Sulivan and T. R. Bashkov, "A large scale homogeneous, fully distributed parallel machine, I," inProc. 4th Symp. Comput. Arch., March 1977, pp. 105-117.
[3] L. G. Valiant, "A scheme for fast parallel communication,"SIAM J. Comput., vol. 11, no. 2, pp. 350-361, May 1982.
[4] C. T. Ho and S. L. Johnsson, "Distributed routing algorithms for broadcasting and personalized communication in hypercubes," inProc. Int. Conf. Parallel Processing, Aug. 1986, pp. 640-648.
[5] H. Katseff, "Incomplete hypercubes," inHypercube Multiprocessors, M. T. Heath, Ed., 1987, pp. 258-264.
[6] J. M. Gordon and Q. F. Stout, "Hypercube message routine in the presence of faults," inProc. 3rd Conf. Hypercube Concurrent Comput. and Appl., Jan. 1988, pp. 318-327.
[7] M. S. Chen, "Distributed routing and task allocation in multicomputer systems," Ph.D. dissertation, The Univ. of Michigan, Ann Arbor, 1988.
[8] M. S. Chen and K. G. Shin, "Message routing in an injured hypercube," inProc. 3rd Conf. Hypercube Concurrent Comput. and Appl., vol. I, Jan. 1988, pp. 312-317.
[9] P. Ramanathan and K. G. Shin, "Reliable broadcast in hypercube multicomputers,"IEEE Trans. Comput., vol. 37, pp. 1654-1657, Dec. 1988.
[10] P. Kermani and L. Kleinrock, "Virtual cut-through: A new computer communication switching technique,"Comput. Networks, vol. 3, pp. 267-286, 1979.
[11] W.J. Dally and C.L. Seitz, "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,"IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[12] T. C. Lee and J. P. Hayes, "Routing and broadcasting in faulty hypercube computers," inProc. 3rd Conf. Hypercube Concurrent Comput. and Appl., vol. I, Jan. 1988, pp. 346-354.
[13] D. C. Grunwald and D. A. Reed, "Networks for parallel processors: Measurements and prognostications," inProc. Third Conf. Hypercube Concurrent Comput. Appl., vol. I, 1988, pp. 610-619.
[14] J. P. Hayeset al., "A microprocessor-based hypercube supercomputer,"IEEE Micro, vol. 6, pp. 6-17, Oct. 1986.
[15] S. F. Nugent, "The iPSC/2 Direct-Connect communications technology," inProc. Third Conf. Hypercube Comput. Appl., Pasadena, CA, Jan. 1988, pp. 56-60.
[16] P. M. Merlin and P. J. Schweitzer, "Deadlock avoidance in store-and-forward networks--I: Store-and-forward deadlock,"IEEE Trans. Commun., vol. COM-28, pp. 345-354, Mar. 1980.
[17] K. D. Gunther, "Prevention of deadlocks in packet-switched data transport systems,"IEEE Trans. Commun., vol. COM-29, no. 4, pp. 512-524, Apr. 1981.
[18] S. Toueg and J. D. Ullman, "Deadlock-free packet switching networks," inProc. 11th ACM Symp. Theory Comput., May 1979, pp. 89-98.

Index Terms:
fault-tolerant communication scheme; hypercube computers; near-optimal routing; broadcasting; node failures; fault-free nodes; computationally efficient routing algorithm; deadlock-free implementations; fault tolerant computing; hypercube networks; parallel architectures.
T.C. Lee, J.P. Hayes, "A Fault-Tolerant Communication Scheme for Hypercube Computers," IEEE Transactions on Computers, vol. 41, no. 10, pp. 1242-1256, Oct. 1992, doi:10.1109/12.166602
Usage of this product signifies your acceptance of the Terms of Use.