This Article 
 Bibliographic References 
 Add to: 
Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares
September 1993 (vol. 42 no. 9)
pp. 1089-1104

This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. The approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. The authors optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k=1, they present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. They also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, they give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.

[1] F. Annexstein, "Fault tolerance of hypercube-derivative networks," inProc. 1st Annu. ACM Symp. Parallel Algorithms Architectures, 1989, pp. 179-188.
[2] V. Balasubramanian and P. Banerjee, "A fault-tolerant massively parallel processing architecture,"J. Parallel Distributed Computing, vol. 4, pp. 363-383, 1987.
[3] K. E. Batcher, "Design of a massively parallel processor,"IEEE Trans. Comput., vol. C-29, no. 9, pp. 836-840, 1980.
[4] R. Beivide, E. Herrada, J. L. Balcázar, and J. Labarta, "Optimized mesh-connected networks for SIMD and MIMD architectures," inProc. 14th Int. Symp. Comput. Architecture, June 1987, pp. 163-170.
[5] J. Bruck, R. Cypher, and C. T. Ho, "On the construction of fault-tolerant cube-connected cycles networks," inProc. 1991 Int. Conf. on Parallel Processing, vol. I, pp. 692-693.
[6] J. Bruck, R. Cypher, and D. Soroker, "Running algorithms efficiently on faulty hypercubes," inProc. 2nd Annu. ACM Symp. Parallel Algorithms Architectures, 1990, pp. 37-44.
[7] M.-S. Chen, K. G. Shin, and D. D. Kandlur, "Addressing, routing, and broadcasting in hexagonal mesh multiprocessors,"IEEE Trans. Comput., vol. C-39, no. 1, pp. 10-18, 1990.
[8] S. Dutt and J. P. Hayes, "On designing and reconfiguringk-fault-tolerant tree architectures,"IEEE Trans. Comput., vol. C-39, no. 4, pp. 490-503, 1990.
[9] S. Dutt and J. P. Hayes, "Designing fault-tolerant systems using automorphisms,"J. Parallel Distributed Comput., pp. 249-268, July 1991.
[10] S. Dutt and J. P. Hayes, "Some practical issues in the design of fault-tolerant multiprocessors," inProc. 21st Int. Symp. on Fault-Tolerant Computing, 1991, pp. 292-299.
[11] B. Elspas and J. Turner, "Graphs with circulant adjacency matrices,"J. Combinatorial Theory, no. 9, pp. 297-307, 1970.
[12] J. Hastad, T. Leighton, and M. Newman, "Fast computation using faulty hypercubes," inProc. 21st Annu. ACM Symp. Theory Comput., May 1989, pp. 251-263.
[13] J. P. Hayes, "A graph model for fault-tolerant computing systems,"IEEE Trans. Comput., vol. C-25, no. 9, pp. 875-884, 1976.
[14] C. Kaklamanis, A. R. Karlin, F. T. Leighton, V. Milenkovic, P. Raghavan, S. Rao, C. Thomborson, and A. Tsantilas, "Asymptotically tight bounds for computing with faulty arrays of processors," inProc. 31st Annu. Symp. Foundations Comput. Sci., IEEE, Oct. 1990, pp. 285-296.
[15] S. Y. Kung, S. N. Jean, and C. W. Chang, "Fault-tolerant array processors using single-track switches,"IEEE Trans. Comput., vol. C-38, no. 4, pp. 501-514, 1989.
[16] S-Y. Kuo and W. K. Fuchs, "Efficient spare allocation for reconfigurable arrays,"IEEE Design and Test, pp. 24-31, 1987.
[17] F. T. Leighton and C. E. Leiserson, "Wafer scale integration of systolic arrays,"IEEE Trans. Computers, vol. C-34, no. 5, pp. 448-461, 1985.
[18] A. J. Martin, "The torus: An exercise in constructing a processing surface," inProc. 2nd Caltech Conf. on VLSI, 1981, pp. 527-537.
[19] M. Paoli, W. W. Wong, and C. K. Wong, "Minimum k-Hamiltonian graphs, II,"J. Graph Theory, vol. 10, pp. 79-95, 1986.
[20] Setting New Horizons. Aachen, Germany: Parsytec Computer GmbH, 1991.
[21] A. L. Rosenberg, "The Diogenes approach to testable fault-tolerant VLSI processor arrays,"IEEE Trans. Comput., vol. C-32, no. 10, pp. 902-910, 1983.
[22] A. L. Rosenberg, "Routing with permuters: Toward reconfigurable and fault-tolerant networks," Tech. Rep. CS-1981-13, Dept. of Computer Science, Duke University, Durham, NC, 1981.
[23] A. L. Rosenberg, "On designing fault-tolerant VLSI processor arrays,"Advances in Computing Research, vol. 2, pp. 181-204, 1984.
[24] V. P. Roychowdhury, J. Bruck, and T. Kailath, "Efficient algorithms for reconfiguration in VLSI/WSI arrays,"IEEE Trans. Comput., vol. C-39, no. 4, pp. 480-489, 1990.
[25] M. R. Samatham and D. K. Pradhan, "The de Bruijn multiprocessor network: A versatile network for parallel computation,"IEEE Trans. Comput., vol. 38, no. 4, pp. 567-581, 1989.
[26] M. Sami and R. Stefanelli, "Reconfigurable architectures for VLSI processing arrays,"Proc. IEEE, vol. 74, no. 5, 1986.
[27] Y. Ueokaet al., "A defect tolerant design for full-wafer memory LSI,"IEEE J. Solid-State Circuits, vol. SC-19, pp. 319-324, June 1984.
[28] W. W. Wong and C. K. Wang, "Minimum k-Hamiltonian graphs,"J. Graph Theory, vol. 8, pp. 155-165, 1984.

Index Terms:
fault-tolerant meshes; hypercubes; d-dimensional mesh; fault-tolerant architecture; multiplexers; buses; tori; hexagonal meshes; fault tolerant computing; hypercube networks; performance evaluation.
J. Bruck, R. Cypher, C. Ho, "Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares," IEEE Transactions on Computers, vol. 42, no. 9, pp. 1089-1104, Sept. 1993, doi:10.1109/12.241598
Usage of this product signifies your acceptance of the Terms of Use.