This Article 
 Bibliographic References 
 Add to: 
Computational Arrays with Flexible Redundancy
April 1994 (vol. 43 no. 4)
pp. 413-430

Different multiple redundancy schemes for fault detection and correction in computational arrays are proposed and analyzed. The basic idea is to embed a logical array of nodes onto a processor/switch array such that d processors, 1/spl les/d/spl les/4, are dedicated to the computation associated with each node. The input to a node is directed to the d processors constituting that node, and the output of the node is computed by taking a majority vote among the outputs of the d processors. The proposed processor/switch array (PSVA) is versatile in the sense that it may be configured as a nonredundant system or as a system which supports double, triple or quadruple redundancy. It also allows for spares to be distributed in the PSVA in a way that permits spare sharing among nodes, thus enhancing the overall system reliability. In addition to choosing the required degree of redundancy, the flexibility of the PSVA architecture allows for the embedding of redundant arrays onto defective PSVA's and for run-time reconfiguration to avoid faulty processors and switches. Different embedding and reconfiguration algorithms are presented and analyzed using Markov chain techniques, using probability arguments, and via simulation.

[1] "Paragon XP/S product overview,"Intel Corporation, 1991.
[2] J. A. Abrahamet al., "Fault tolerance techniques for systolic arrays,"IEEE Comput. Mag., vol. 20, pp. 65-74, July 1987.
[3] M. Alam and R. Melhem, "An efficient spare allocation scheme and its application to fault tolerant binary hypercubes,"IEEE Trans. Parallel and Distrib. Syst., vol. 2, no. 1, pp. 117-126, Jan. 1991.
[4] C. Anfinson and F. Luk, "A linear algebraic model of algorithm-based fault tolerance,"IEEE Trans. Comput., vol. 37, no. 12, pp. 1599-1604, Dec. 1988.
[5] M. Chean and J. Fortes, "The full-use-of-suitable-spares (FUSS) approach to hardware reconfiguration for fault-tolerant processor arrays,"IEEE Trans. Comput., vol. 39, no. 4, Apr. 1990.
[6] R. Harper, J. Lala and J. Deyst, "Fault tolerant parallel processor architecture overview," inProc. of FTCS 18, 1988, pp. 252-257.
[7] A. Hopkins, "FTMP--A Highly Reliable Fault Tolerant Multiprcessor for Aircraft,"Proc. of IEEE, vol. 66, no. 10, pp. 1221-1239, 1978.
[8] K. Huang and J. Abraham, "Algorithm-based fault-tolerance for matrix operations,"IEEE Trans. Comput., vol. C-36, no. 6, pp. 518-528, June 1984.
[9] J. Kim and S. Reddy, "On the design of fault tolerant tow dimensional systolic arrays for yield enhancement,"IEEE Trans. Comput., vol. 38, no. 4, pp. 515-525, Apr. 1989.
[10] D. Kiskis and K. Shin, "Embedding triple-modular redundancy into a hypercube architecture," inProc. of the Third Conf. on Hypercube Concurrent Comput. and Applicat., 1988, pp. 337-345.
[11] I. Koren, "A reconfigurable and fault-tolerant VLSI multiprocessor array," inProc. 8th Int. Symp. Comput. Architecture, Minneapolis, MN, May 1981, pp. 425-442.
[12] S. Y. Kung, S. N. Jean, and C. W. Chang, "Fault-tolerant array processors using single track switches,"IEEE Trans. Comput., vol. 38, no. 4, pp. 501-514, Apr. 1989.
[13] C. Kwan and S. Toida, "Optimal fault-tolerant realizations of some classes of hierarchical tree systems," inProc. of FTCS 11, 1981, pp. 176-178.
[14] E. S. Manolakos, D. Dakhil, and M. Vai, "Concurrent error diagnosis in mesh array architectures based on overlapping H-processes," inProc. IEEE Int. Wkshp. Defect Fault Tolerance VLSI Syst., Nov. 1991, pp. 139-152.
[15] R. Melhem and J. Ramirez, "Meshes with multiple redundancy," inAlgorithms and Parallel VLSI Architectures II, P. Quinton and Y. Robert, Eds. New York: Elsevier, 1991.
[16] R. Melhem, "Bi-level reconfigurations of fault tolerant arrays,"IEEE Trans. Comput., vol. 41, no. 2, pp. 231-239, Feb. 1992.
[17] O. Menzilcioglu, H. T. Kung, and S. Wong, "Comprehensive evaluation of a two-dimensional configurable array," inProc. Fault Tolerant Computing Symp., 1989, pp. 93-100.
[18] R. Negrini, M. Sami, and Stefanelli, "Fault tolerance techniques for array structures used in supercomputing,"IEEE Comput. Mag., pp. 78-87, Feb. 1986.
[19] A. Rosenberg, "The Diogenes approach to testable fault-tolerant array processors,"IEEE Trans. Comput., vol. C-32, no. 10, pp. 902-910, Oct. 1983.
[20] L. Shombert and D. Siewiorek, "Using redundancy for concurrent testing and repairing systolic arrays," inSeventeenth Int. Symp. on Fault-Tolerant Computing, 1987, pp. 244-249.
[21] A. Singh and H. Youn, "An efficient restructuring approach for wafer scale processor arrays," inProc. Int. Workshop on Defect and Fault in VLSI Syst., 1988, pp. 395-407.
[22] L. Snyder, "Introduction to the configurable, highly parallel computer,"Comput., vol. 15, no. 1, pp. 47-56, Jan. 1982.
[23] N. Theuretzbacher, "VOTRICS: Voting triple modular computer systems," inProc. of FTCS 16, 1986, pp. 144-150.

Index Terms:
redundancy; parallel processing; fault tolerant computing; logic design; flexible redundancy; fault detection; correction; computational arrays; processor/switch array; redundancy; redundant arrays; reconfiguration algorithms; embedding; Markov chain techniques; probability arguments; faulty processors; fault tolerant arrays; reconfiguration; defect avoidance; fault masking.
J. Ramirez, R. Melhem, "Computational Arrays with Flexible Redundancy," IEEE Transactions on Computers, vol. 43, no. 4, pp. 413-430, April 1994, doi:10.1109/12.278480
Usage of this product signifies your acceptance of the Terms of Use.