This Article 
 Bibliographic References 
 Add to: 
Bi-Level Reconfigurations of Fault Tolerant Arrays
February 1992 (vol. 41 no. 2)
pp. 231-239

Two types of algorithms are considered, namely, local algorithms and global algorithms. In a local algorithm, no processors need to know the status of all other processors in the system. The recovery process is distributed among the processors with each processor using extremely local knowledge. With these properties, the reconfiguration algorithm may achieve fast recovery and real time response but many sacrifice the optimal use of redundancy. In contrast, the goal of a global algorithm is to optimize the use of redundancy with respect to some fault tolerance criteria. This, however, requires global knowledge about other processors in the system and often necessitates extensive changes in the configuration of the system. For unmaintained, long-life systems, local fault tolerance algorithms have the advantages of fast recovery, while global fault tolerance algorithms provide better reliability and longer life expectancy. Fortunately, under certain conditions, it is possible to combine the advantages of the two types of algorithms. These conditions are described.

[1] M. Alam and R. Melhem, "An efficient modular spare allocation scheme and its application to fault tolerant binary hypercubes,"IEEE Trans. Parallel Distributed Syst., vol. 2, no. 1, pp. 117-126, 1991.
[2] P. Banerjeeet al., "An evaluation of system-level fault tolerance on the intel hypercube multiprocessor," inProc. 18th Int. Symp. Fault-Tolerant Comput., 1988, pp. 362-367.
[3] W. Bouricious, W. Carter, D. Jessep, P. Schneider, and A. Wadia, "Reliability modeling for fault tolerant computer,"IEEE Trans. Comput., pp. 1306-1311, Nov. 1971.
[4] A. S. M. Hassan and V. K. Agarwal, "A fault tolerant modular architecture for binary trees,"IEEE Trans. Comput., vol. C-35, no. 4, pp. 356-361, Apr. 1986.
[5] A. Hopkins, B. Smith, and J. Lala, "FTMP--A highly reliable fault-tolerant multiprocessor for aircraft,"Proc. IEEE, vol. 66, no. 10, pp. 1221-1239, 1978.
[6] S. Hosseini, J. Kuhl, and S. Reddy, "Distributed fault tolerance of tree structures,"IEEE Trans. Comput., vol. C-36, pp. 1378-1382, Nov. 1987.
[7] K. Huang and J. Abraham, "Algorithm-based fault-tolerance for matrix operations,"IEEE Trans. Comput., vol. C-33, pp. 518-528, June 1984.
[8] I. Koren, "A reconfigurable and fault-tolerant VLSI multiprocessor array," inProc. 8th Int. Symp. Comput. Architecture, Minneapolis, MN, May 1981, pp. 425-442.
[9] S. Y. Kung, C. W. Chang, and C. W. Jen, "Fault-tolerance design in real-time VLSI array processors," inProc. IEEE Phoenix Conf. Comput. Commun., 1987, pp. 110-115.
[10] M. B. Lowrie and W. K. Fuchs, "Reconfigurable tree architectures using subtree oriented fault tolerance,"IEEE Trans. Comput., vol. C-36, pp. 1172-1182, Oct. 1987.
[11] R. Melhem and J. Ramirez, "Reconfiguration of computational arrays with multiple redundancy," inProc. Int. Conf. Parallel Processing, 1991, pp. 558-565.
[12] R. Negrini, M. Sami, and Stefanelli, "Fault tolerance techniques for array structures used in supercomputing,"IEEE Comput. Mag., pp. 78-87, Feb. 1986.
[13] F. Provost and R. Melhem, "Distributed fault tolerant embedding of binary trees and ring in hypercubes," inProc. Int. Workshop Defect and Fault Tolerance in VLSI Syst., Oct. 1988.
[14] D. A. Rennels, "Fault tolerant computing: Concepts and examples,"IEEE Trans. Comput., pp. 1116-1129, Dec. 1984.
[15] A. Rosenberg, "The Diogenes approach to testable fault-tolerant array processors,"IEEE Trans. Comput., vol. C-32, no. 10, pp. 902-910, 1983.
[16] L. Shombert and D. Siewiorek, "Using redundancy for concurrent testing and repairing systolic arrays," inProc. Seventeenth Int. Symp. Fault-Tolerant Comput., 1987, pp. 244-249.
[17] A. Singh, "Interstitial redundancy: An area efficient fault tolerance scheme for large area VLSI processor arrays,"IEEE Trans. Comput., vol. 37, no. 11, pp. 1398-1410, 1988.

Index Terms:
bi-level configurations; fault tolerant arrays; local algorithms; global algorithms; local knowledge; reconfiguration algorithm; fast recovery; real time response; redundancy; global knowledge; long-life systems; fault tolerance algorithms; reliability; life expectancy; fault tolerant computing; multiprocessor interconnection networks; parallel algorithms.
R.G. Melhem, "Bi-Level Reconfigurations of Fault Tolerant Arrays," IEEE Transactions on Computers, vol. 41, no. 2, pp. 231-239, Feb. 1992, doi:10.1109/12.123400
Usage of this product signifies your acceptance of the Terms of Use.