This Article 
 Bibliographic References 
 Add to: 
On Fault-Tolerant Structure, Distributed Fault-Diagnosis, Reconfiguration, and Recovery of the Array Processors
July 1989 (vol. 38 no. 7)
pp. 932-942
A study is made of the design of fault-tolerant array processors. It is shown how hardware redundancy can be used in the existing structures in order to make them capable of withstanding the failure of some of the array links and processors. Distributed fault-tolerance schemes are introduced for the diagnosis of the faulty elements, reconfiguration, and recovery of the array. Fault tolerance is

[1] H. T. Kung, "Why systolic architectures?,"IEEE Computer, pp. 37- 46, Jan. 1982.
[2] S. Y. Kung, "On supercomputing with systolic/wavefront array processors,"Proc. IEEE, vol. 72, pp. 867-884, July 1984.
[3] H. T. Kung, "Let's design algorithms for VLSI systems," inProc. Caltech Conf. Very Large Scale Integration, Jan. 1979, pp. 66-90.
[4] K. Hwang and F. A. Briggs,Computer Architecture and Parallel Processing. New York: McGraw-Hill, 1984.
[5] I. Koren and M. A. Breuer, "On area and yield considerations for fault-tolerant VLSI processor arrays,"IEEE Trans. Comput., vol. C- 33, Jan. 1984.
[6] I. Koren, "A reconfigurable and fault-tolerant VLSI multiprocessor array," inProc. 8th Int. Symp. Comput. Architecture, Minneapolis, MN, May 1981, pp. 425-442.
[7] A. L. Fisher, H. T. Kung, L. M. Monier, and Y. Dohi, "Architecture of the PSC: A programmable systolic chip," inProc. 10th Int. Symp. Comput. Architecture, 1983.
[8] R. Negrini and R. Stefanelli, "Algorithms for self-reconfiguration of wafer-scale regular arrays," inProc. Int. Conf. Circuits Syst.(ICC AS-85), Oct. 1985, pp. 190-196.
[9] M. G. Sami and R. Stefanelli, "Fault-tolerance of VLSI processing arrays: The time redundancy approach," inProc. 1984 Real Time Syst. Symp., Dec. 1984, pp. 200-207.
[10] M. G. Sami and R. Stefanelli, "Fault-stealing: An approach to fault-tolerance of VLSI array structures," inProc. Int. Conf. Circuits Syst. (ICCAS-85), June 1985, pp. 205-210.
[11] R. Negrini and R Stefanelli, "Time redundancy in WSI array of processing elements," inProc. Int. Conf. Supercomput. Syst., (SCS-850), Dec. 1985, pp. 429-438.
[12] R. Negrini, M. G. Sami, and R. Stefanelli, "Fault-tolerance approaches for VLSI/WSI arrays," inProc. Conf. Comput. Commun., Mar. 1985, pp. 460-468.
[13] R. Negrini, M. Sami, and Stefanelli, "Fault tolerance techniques for array structures used in supercomputing,"IEEE Comput. Mag., pp. 78-87, Feb. 1986.
[14] W. R. Moore and R. Mahat, "Fault-tolerant communications for wafer-scale integration of a processor array,"Microelectron. Reliab., vol. 25, no. 2, pp. 291-294, 1985.
[15] W. R. Moore, "A review of fault tolerant techniques for the enhancement of integrated circuit yield,"GEC J. Res., vol. 2, pp. 1- 15, Jan. 1984.
[16] H. T. Kung and M. S. Lam, "Fault-tolerant VLSI systolic arrays and two-level pipeline," inProc. Soc. Photo-Opt. Instrum. Eng., vol. 431, pp. 143-158, Aug. 1983.
[17] A. L. Rosenberg, "On designing fault-tolerant VLSI processor arrays,"Advances Comput. Res., vol. 2, pp. 181-204, 1984.
[18] A. L. Rosenberg, "The Diogenes approach to testable fault-tolerant arrays of processors,"IEEE Trans. Comput., vol. C-32, pp. 902-910, Oct. 1983.
[19] K. S. Hedlund and L. Snyder, "Systolic architectures--A wafer scale approach," inProc. IEEE Int. Conf. Comput. Design, 1984, pp. 604-610.
[20] D. Fussel and P. Varman, "Fault-tolerant wafer-scale architectures for VLSI," inProc. 9th Annu. Symp. Comput. Architecture, May 1982.
[21] D. Fussell and P. Varmon, "Designing systolic algorithms for fault-tolerance," inProc. IEEE Int. Conf. Comput. Design, 1984, pp. 616-622.
[22] T. Leighton and C. E. Leiserson, "Wafer-scale integration of systolic arrays,"IEEE Trans. Comput., vol. C-34, pp. 448-461, May 1985.
[23] Y. H. Choi, S. H. Han, and M. Malek, "Fault-diagnosis of reconfigurable systolic arrays," inProc. IEEE Int. Conf. Comput. Design, 1984, pp. 451-455.
[24] J. A. B. Fortes and C. S. Raghavendra, "Dynamically reconfigurable fault-tolerant array processors," inProc. FTCS, pp. 386-392.
[25] A. K. Somani and V. K. Agarwal, "System level diagnosis in systolic systems," inProc. IEEE Int. Conf. Comput. Design, 1984, pp. 445-450.
[26] T. E. Mangir, "Sources of failure and yield improvement for VLSI and restructurable interconnects for RVLSI and WSI: Part I-Sources of failures and yield improvement for VLSI,"Proc. IEEE, vol. 72, pp. 690-708, June 1984.
[27] R. J. Antinone, "How to prevent circuit zapping,"IEEE Spectrum, vol. 24, no. 4, pp. 34-38, Apr. 1987.
[28] D. P. Siewiorek and R. S. Swarz,The Theory and Practice of Reliable System Design. Bedford, MA: Digital, 1982, ch. 2.
[29] J. P. Shenet al., "Inductive analysis of most integrated circuits,"IEEE Design Test, vol. 2, pp. 13-26, Dec. 1985.
[30] R. O. Carlson and C. A. Neugebauer, "Future trends in wafer scale integration,"Proc. IEEE, pp. 1741-1752, Dec. 1986.
[31] F. P. Preparata, G. Metze, and R. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electron. Comput., vol. EC-16, pp. 848-854, 1967.
[32] J. G. Kuhl and S. M. Reddy, "Distributed fault-tolerance for large multiprocessor system," inProc. 1980 Comput. Architecture Conf., France, May 1980.
[33] J. G. Kuhl and S. M. Reddy, "Fault-diagnosis in fully distributed systems," inEleventh Int. Conf. Fault-Tolerant Comput. 1981, pp. 100-105.
[34] P. M. Merlin and B. Randell, "State restoration in distributed systems," inProc. 8th Annu. Int. Conf. Fault-Tolerant Comput., June 1978, pp. 129-134.
[35] S. H. Hosseini, J. G. Kuhl, and S. M. Reddy, "On integrated faultdiagnosis and error recovery in distributed computing systems," inProc. Int. Symp. Fault-Tolerant Comput., 1983, pp. 56-63.
[36] C. H. Stapper, "On yield, fault distributions and clustering of particles,"IBM J. Res. Develop., vol. 30, no. 3, pp. 326-338, May 1986.
[37] J. G. Kuhl and S. M. Reddy, "Fault-tolerance considerations in large, multiple-processor systems,"IEEE Computer, vol. 19, no. 3, pp. 56-67, Mar. 1986.
[38] H. T. Kung, "The CMU Warp Processors," inSupercomputers Algorithms, Architecture, and Scientific Computation, F. A. Matsen and T. Tajima, Eds., Univ. of Texas, Austin, 1986, pp. 235- 248.
[39] W. Lipski, Jr. and F. P. Preparata, "A unified approach to layout wirability,"Math. Systems Theory, vol. 19, pp. 189-203, 1987.
[40] F. Mattern, "Algorithms for distributed termination,"J. Distribut. Comput., vol. 2, pp. 161-175, 1987.
[41] B. Szymanski, T. Shi, and N. S. Pryweset al., "Synchronized distributed termination,"IEEE Trans. Software Eng., vol. SE-II, pp. 1136-1140, Oct. 1985.
[42] L. Lamport, "Time, clocks, and the ordering of events in a distributed system,"Commun. ACM, vol. 21, no. 7, pp. 558-565, July 1978.

Index Terms:
fault-tolerant structure; distributed fault-diagnosis; reconfiguration; recovery; array processors; hardware redundancy; faulty elements; decentralized form; distributed processing; fault tolerant computing; parallel processing.
S.H. Hosseini, "On Fault-Tolerant Structure, Distributed Fault-Diagnosis, Reconfiguration, and Recovery of the Array Processors," IEEE Transactions on Computers, vol. 38, no. 7, pp. 932-942, July 1989, doi:10.1109/12.30846
Usage of this product signifies your acceptance of the Terms of Use.