This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Fault-Tolerant Tree Communication Scheme for Hypercube Systems
June 1996 (vol. 45 no. 6)
pp. 641-650

Abstract—The tree communication scheme was shown to be very efficient for global operations on data residing in the processors of a hypercube with time complexity of O(log2N), where N is the number of processors. This communication scheme is very useful for many parallel algorithms on hypercube multiprocessors. If a problem can be divided into independent subproblems, each subproblem can first be solved by one of the processors. Then, the tree communication scheme is invoked to merge the subresults into the final results. All the algorithms for problems with this property can benefit from the tree communication scheme. We propose a more general and efficient tree communication scheme in this paper. In addition, we also propose fault-tolerant algorithms for the tree communication scheme, by exploiting the unique properties of the tree communication scheme. The computation and communication slowdown is small (< 2) under the effect of multiple link and/or node failures.

[1] A. Avizienis, "Fault-Tolerance: The Survival Attribute of Digital Systems," Proc. IEEE, vol. 66, pp. 1,109-1,125, Oct. 1978.
[2] Y. Saad and M. Schultz, "Topological Properties of Hypercubes," IEEE Trans. Computers, vol. 37, no. 7, pp. 867-872, July 1988.
[3] A.C. Elster and A.P. Reeves, "Block-Matrix Operations Using Orthogonal Trees," Proc. SIAM Third Int'l Conf. Hypercube Multiprocessors, pp. 1,554-1,561,Pasadena, Calif., Jan. 1988.
[4] A.C. Elster, M.U. Uyar, and A.P. Reeves, "Fault-Tolerant Matrix Operations on Hypercube Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. III 169-177, Aug. 1989.
[5] T.C. Lee and J.P. Hayes,“A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1,242-1,256, Oct. 1992.
[6] T.C. Lee and J.P. Hayes, "Routing and Broadcasting in Faulty Hypercube Computers," Proc. Third Conf. Hypercube Concurrent Computers and Applications, pp. 625-630, 1988.
[7] S. Park and B. Bose, "Broadcasting in Hypercubes with Link/Node Failures," Proc. Fourth Symp. Frontiers of Massively Parallel Computation, pp. 286-290, 1992.
[8] C.S. Raghavendra,P.-J. Yang,, and S.-B. Tien,“Free dimensions—an effective approach to achieving fault tolerance in hypercubes,” 22nd Ann. Int’l Symp. Fault-Tolerant Computing, pp. 170-177, 1992.
[9] B.S. Chlebus, K. Diks, and A. Pelc, "Optimal Broadcasting in Fault Hypercubes," Proc. IEEE 21st Int'l Symp. Fault-Tolerant Computing, pp. 266-273, June 1991.
[10] P. Ramanathan and K.G. Shin, "Reliable Broadcast in Hypercube Multicomputers," IEEE Trans. Computers, vol. 37, no. 12, pp. 1,654-1,657, Dec. 1988.
[11] J. Salmon, "Binary Gray Codes and the Mapping of Physical Lattice into a Hypercube," Caltech Concurrent Processor (ccp) Hm-51, 1983.
[12] J. Bruck, R. Cypher, and D. Soroker, "Running Algorithms Efficiently on Faulty Hypercubes," Computer Architecture News, vol. 19, no. 1, pp. 89-96, Mar. 1991.
[13] J. Bruck, R. Cypher, and D. Soroker, "Tolerating Faults in Hypercubes Using Subcube Partitioning," IBM Technical Report, RJ 8142 (74555), May 1991.
[14] H. Sullivan and T.R. Bashkow, "A Large Scale, Homogenous, Fully Distributed Parallel Machine," Proc. Fourth Ann. Int'l Symp. Computer Architecture, pp. 105-117, May 1977.
[15] L.G. Valiant, "A Scheme for Fast Parallel Communication," SIAM J. Computers, vol. 11, no. 2, pp. 350-361, May 1982.
[16] H. Kataeff, "Incomplete Hypercubes," Hypercube Multiprocessors, M.T. Heath, ed., pp. 258-264, 1987.
[17] A.L. DeCegama, The Technology of Parallel Processing—Parallel Processing Architectures and VLSI Hardware, vol. 1. Prentice Hall, 1989.
[18] G.H. Golub and C.F. Van Loan, Matrix Computations.Baltimore: John Hopkins, 1983.
[19] S. Latifi, S. Zheng, and N. Bagherzadeh, "Optimal Ring Embedding in Hypercubes with Faulty Links," Proc. Fault-Tolerant Computing Symp., pp. 178-184, 1992.
[20] F.T. Luk and H. Park, “An Analysis of Algorithm-Based Fault Tolerance Techniques,” J. Parallel and Distributed Computing, vol. 5, pp. 172-184, 1988.

Index Terms:
Hypercube, failures, tree communication, uniform data distribution, fault-tolerance.
Citation:
Yuh-Rong Leu, Sy-Yen Kuo, "A Fault-Tolerant Tree Communication Scheme for Hypercube Systems," IEEE Transactions on Computers, vol. 45, no. 6, pp. 641-650, June 1996, doi:10.1109/12.506421
Usage of this product signifies your acceptance of the Terms of Use.