This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring
May 2000 (vol. 49 no. 5)
pp. 431-442

Abstract—An advanced spare-connection scheme for k-out-of-n redundancy called “generalized additional bypass linking” is proposed for constructing fault-tolerant massively parallel computers with series-connected, mesh-connected, or tree-connected processing element (PE) arrays. This scheme uses bypass links with wired OR connections to selectively connect the primary PEs to a spare PE in parallel. These bypass links are allocated to the primary PEs by node-coloring of a graph with a minimum inter-node distance of three in order to minimize the number of bypass links (i.e., the chromatic number). The main advantage of this scheme is that it can be used for constructing various k-out-of-n configurations capable of enhanced PE-to-PE communication and broadcast while still achieving strong fault tolerance for these PEs and links. In particular, it enables the construction of optimal r-strongly-fault-tolerant configurations capable of direct k-out-of-n selections by providing r spare PEs and $r$ extra connections per PE for any kind of array when node-coloring with a distance of three is used. This simple spare-circuit structure enhances fault tolerance more than conventional schemes do. The node-coloring patterns were constructed using new node-coloring algorithms and the chromatic numbers were evaluated theoretically. Enhanced PE-to-PE communication and broadcast were achieved by using new fault-tolerant routing algorithms based on the properties of the node-coloring patterns with four or five message transmission steps being optimal configurations with any size array.

[1] M. Chean and J.A.B. Fortes, A Taxonomy of Reconfiguration Techniques for Fault-Tolerant Processor Arrays Computer, pp. 55-69, Jan. 1990.
[2] M. Peercy and P. Banerjee, “Fault Tolerant VLSI Systems,” Proc. IEEE, vol. 81, no. 5, pp. 745-758, May 1993.
[3] S.K. Tewksbury, “Physical Boundaries of Performance: The Interconnection Perspective,” Proc. 1991 Int'l Workshop Defect and Fault Tolerance in VLSI Systems, pp. 227-246, 1991.
[4] N. Tzeng, A Cube-Connected Cycles Architecture with High Reliability and Improved Performance IEEE Trans. Computers, vol. 42, no. 2, pp. 246-253, 1993.
[5] A.L. Rosenberg, “The Diogenes Approach to Testable Fault-Tolerant Arrays of Processors,” IEEE Trans. Computers, vol. 32, no. 10, pp. 902-910, Oct. 1983.
[6] J.W. Greene and A. Gamal, "Configuration of VLSI Arrays in the Presence of Defects," J. ACM, vol. 41, no. 4, pp. 694-717, 1984.
[7] R. Negrini, M. Sami, and R. Stefanelli, Fault Tolerance Techniques for Array Structures Used in Supercomputing Computer, pp. 78-87, Feb. 1986.
[8] S.Y. Kung, S.N. Jean, and C.W. Chang, "Fault-Tolerant Array Processors Using Single-Track Switches," IEEE Trans. Computers, vol. 38, no. 4, pp. 501-514, Apr. 1989.
[9] R. Mazzaferri and T. M. Murray, “The Connection Network Class for Fault Tolerant Meshes,” IEEE Trans. Computers, vol. 44, no. 1, pp. 131-138, Jan. 1995.
[10] S.H. Hosseini, “On Fault-Tolerant Structure, Distributed Fault-Diagnosis, Reconfiguration, and Recovery of the Array Processors,” IEEE Trans. Computers, vol. 38, no. 7, pp. 932-942, July 1989.
[11] N. Tsuda, “Hierarchical Redundancy for Orthogonal Arrays,” Proc. 1992 IEEE Int'l Conf. Wafer Scale Integration, pp. 220-229, Jan. 1992.
[12] J.P. Hayes, “A Graph Model for Fault-Tolerant Computing Systems,” IEEE Trans. Computers, vol. 25, no. 9, pp. 875-884, Sept. 1976.
[13] S. Dutt and J.P. Hayes, “On Designing and Reconfiguring K-Fault-Tolerant Tree Architectures,” IEEE Trans. Computers, vol. 39, no. 4, pp. 490–503, Apr. 1990.
[14] S. Dutt and J.P. Hayes, “Designing Fault-Tolerant Systems Using Auto-morphisms,” J. Parallel and Distributed Computing, vol. 12, no. 3, pp. 249–268, 1991.
[15] S. Dutt and J.P. Hayes, “Some Practical Issues in the Design of Fault-Tolerant Multiprocessors,” IEEE Trans. Computers, vol. 41, no. 5, pp. 588–598, May 1992.
[16] H.K. Ku and J.P. Hayes, “Systematic Design of Fault-Tolerant Multiprocessors with Shared Buses,” IEEE Trans. Computers, vol. 46, no. 4, pp. 439-455, 1997.
[17] J. Bruck, R. Cypher, and C.-T. Ho, "Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares," IEEE Trans. Computers, vol. 42, no. 9, pp. 1,089-1,104, Sept. 1993.
[18] D.A. Rennels, “On Implementing Fault Tolerance in Binary Hypercubes,” Proc. 16th Fault-Tolerant Computing Symp., pp. 344-349, 1986.
[19] D.L. Landis, L.E. Schramm, and W.A. Check, “Fault Tolerant Maintenance Networks for Highly Reliable WSI Systems,” Proc. 1990 Int'l Conf. Comm. (ICC '90), pp. 791-795, 1990.
[20] T. Ishikawa, “Universal Fault-Tolerant Hypercube Architecture without a Switching Mechanism,” Systems&Computers in Japan, vol. 21, no. 3, pp. 57-65, 1990.
[21] N. Tsuda, “Defect-Tolerant Hypercube Architectures Using Hierarchical Redundancy,” Proc. 1995 IEEE Int'l Conf. Wafer Scale Integration, pp. 143-152, Jan. 1995.
[22] N. Tsuda, “Defect-Tolerant Hypercube Architectures Using Hierarchical Redundancy,” J. Microelectronic System Integration, vol. 4, no. 4, pp. 17-31, 1996.
[23] N. Tsuda, T. Ishikawa, and Y. Nakamura, “Totally Defect-Tolerant Arrays Capable of Quick Broadcasting,” Proc. 1995 IEEE Int'l Workshop Defect and Fault Tolerance in VLSI Systems, pp. 117-125, Nov. 1995.
[24] N. Tsuda, “Fault-Tolerant Cube-Connected Cycles Capable of Quick Broadcasting,” Proc. 1996 IEEE Int'l Conf. Innovative Systems in Silicon, pp. 362-371, Oct. 1996.
[25] N. Tsuda, “Fault-Tolerant Shuffle-Exchange and de Bruijn Networks Capable of Quick Broadcasting,” Proc. 1996 IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, pp. 231-239, Nov. 1996.
[26] N. Tsuda, “Fault-Tolerant Hierarchical Interconnection Networks Constructed by Additional Bypass Linking with Graph-Node Coloring,” Proc. 1997 IEEE Int'l Symp. Defect and Fault Tolerance in VLSI systems, pp. 227-231, Oct. 1997.

Index Terms:
Processor array, mesh, tree, fault tolerance, k-out-of-n redundancy, additional bypass linking, graph-node coloring, enhanced communication and broadcast.
Citation:
Nobuo Tsuda, "Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring," IEEE Transactions on Computers, vol. 49, no. 5, pp. 431-442, May 2000, doi:10.1109/12.859538
Usage of this product signifies your acceptance of the Terms of Use.