This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automatic Synthesis of Self-Recovering VLSI Systems
February 1996 (vol. 45 no. 2)
pp. 131-142

Abstract—In this paper, we will describe an integrated system for synthesizing self-recovering microarchitectures called ${\cal SYNCERE}$. In the ${\cal SYNCERE}$model for self-recovery, transient faults are detected using duplication and comparison, while recovery from transient faults is accomplished via checkpointing and rollback. ${\cal SYNCERE}$initially inserts checkpoints subject to designer specified recovery time constraints. Subsequently, ${\cal SYNCERE}$incorporates detection constraints by ensuring that two copies of the computation are executed on disjoint hardware. Towards ameliorating the dedicated hardware required for the original and duplicate computations, ${\cal SYNCERE}$imposes intercopy hardware disjointness at a sub-computation level instead of at the overall computation level. The overhead is further moderated by restructuring the pliable input representation of the computation. ${\cal SYNCERE}$has successfully derived numerous self-recovering microarchitectures. Towards validating the methodology for designing fault-tolerant VLSI ICs, we carried out a physical design of a self-recovering 16-point FIR filter.

[1] P. Ames, P. Miles, E. Milham, C. Pilch, W. Rodda, L. Tedesco, and B.V. Tine, "Automotive electronics: A design and test roundtable," IEEE Design and Test of Computers, vol. 35, pp. 84-93, 1992.
[2] P. Banerjee and J.A. Abraham, "Fault-secure algorithms for multiple processor systems," Proc. 11th Int'l Symp. Computer Architecture, pp. 279-287, 1984.
[3] D. Blough and A. Nicolau, "Fault tolerance in super-scalar and VLSI processors," Proc. 1992 IEEE Workshop Fault-Tolerant Parallel and Distributed Systems, pp. 193-200, 1992.
[4] H.H. Braess, "Electronics—The co-evolution of intelligence in automobiles and road traffic," Proc. Sixth Int'l Conf. Automotive Electronics, pp. 109, 1987.
[5] A. Casotto, OCTTOOLS 5.1: The User Guide. Electronics Research Laboratory, Univ. of California, Berkeley, 1991.
[6] X. Castillo, S.R. McConnell, and D.P. Siewiorek, "Derivation and calibration of a transient errore reliability model," IEEE Trans. Computers, vol. 31, no. 7, pp. 658-671, July 1982.
[7] K.M. Chandy and C.V. Ramamoorthy, "Rollback and recovery strategies for computer programs," IEEE Trans. Computers, vol. 37, pp. 546-556, 1972.
[8] A.T. Dahbura, K.K. Sabnami, and W.J. Hery, "Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems," IEEE Trans. Computers, Vol. 38, No. 6, June 1989, pp. 881-891.
[9] D. Gajski et al., High-Level Synthesis: Introduction to Chip and System Design, Kluwer Academic Publishers, 1992.
[10] C.H. Golub and C. van Loan, Matrix Computation.Baltimore: Johns Hopkins Univ. Press, 1989.
[11] D. Gu, D.J. Rosenkrantz, and S.S. Ravi, “Construction and Analysis of Fault Secure Multiprocessor Schedules,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 120-127, June 1991.
[12] L.M. Guerra, M.M. Potkonjak, and J.M. Rabaey, “High Level Synthesis of Reconfigurable Data Path Structures,” Proc. Int'l Conf. Computer-Aided Design, pp. 26-29, Nov. 1993.
[13] I.G. Harris and A. Orailoglu, "Microarchitectural synthesis of VLSI designs with high test concurrency," Proc. 1994 Design Automation Conf., pp. 206-211, 1994.
[14] B.W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, pp. 394-402. Reading, Mass.: Addison-Wesley, June 1989.
[15] R. Karri, K. Högstedt, and A. Orailoglu, "Rapid prototyping of fault tolerant VLSI systems," Proc. Seventh Int'l Symp. High Level Synthesis, pp. 126-131, May 1994.
[16] P.K. Lala, Fault Tolerant and Fault Testable Hardware Design. New York: Prentice Hall Int'l, 1985.
[17] R. Leveugle and G. Saucier, “Optimized Synthesis of Concurrently Checked Controllers,” IEEE Trans. Computers, Vol. 39, No. 4, Apr. 1990, pp. 419-425.
[18] C.-C.J. Li and W.K. Fuchs, “CATCH—Compiler-Assisted Techniques for Checkpointing,” Proc. IEEE Fault-Tolerant Computing Symp., pp. 74-81, June 1990.
[19] A. ${\rm Orailo\breve{g}lu}$ and R. Karri, "Coactive Scheduling and Checkpoint Determination During High-Level Synthesis of Self-Recovering Microarchitectures," IEEE Trans. VLSI Systems, Vol. 2, No. 3, Sept. 1994, pp. 304-311.
[20] A. Orailoglu and R. Karri, "Synthesis of fault-tolerant and real time microarchitectures," J. Systems and Software, vol. 25, no. 1, pp. 73-84, 1994.
[21] A.C. Parker, J. Pizarro, and M.J. Mlinar, "MAHA: A Program for Datapath Synthesis," Proc. ACM/IEEE Design Automation Conf., 1986.
[22] P.G. Paulin and J.P. Knight, "Force-Directed Scheduling for the Behavioral Synthesis of ASIC's," IEEE Trans. Computer-Aided Design, vol. 8, June 1989.
[23] M. Peercy and P. Banerjee, “Fault Tolerant VLSI Systems,” Proc. IEEE, vol. 81, no. 5, pp. 745-758, May 1993.
[24] A.M. Saleh and J.H. Patel,"Transient-fault analysis for retry techniques," IEEE Trans. Reliability, vol. 37, no. 3, pp. 323-330, Aug. 1988.
[25] R.M. Sedmak and H.L. Liebergot, "Fault tolerance of a general purpose computer implemented by very large scale integration," IEEE Trans. Computers, vol. 29, no. 6, pp. 492-500, June 1980.
[26] R.J. Straub, "Automotive electronics IC reliability," Proc. Custom Integrated Circuits Conf., pp. 19.2.1-19.2.4, May 1990.
[27] Y. Tamir et al., "The UCLA Mirror Processor: A Building Block for Self-Checking Self-Repairing Computing Nodes," Proc. 21st Int'l Fault-Tolerant Computing Symp. (FTCS 91), IEEE CS Press, Los Alamitos, Calif., 1991, pp. 178-185.
[28] Y. Tamir and M. Tremblay, "High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback," IEEE Trans. Computers, vol. 39, no. 4, Apr. 1990, pp. 548-554.
[29] S. Toueg and Ö. Babaoglu, "On the Optimum Checkpoint Selection Problem," SIAM J. Computing, vol. 13, pp. 630-649, Aug. 1984.
[30] M.M. Taso, "The design of C.fast: A single chip fault tolerant microprocessor," Proc. 12th Int'l Symp. Fault Tolerant Computing, pp. 63-69, 1982.
[31] J.S. Upadhyaya and K.K. Saluja, "Rollback and recovery strategies for computer programs," IEEE Trans. Software Engineering, vol. 12, pp. 546-556, 1986.
[32] R.A. Walker and D.E. Thomas, “Behavioral Transformation for Algorithmic Level IC Design,” IEEE Trans. Computer-Aided Design, vol. 8, no. 10, pp. 1,115-1,128, 1989.
[33] M.M. Yen, W.K. Fuchs, and J.A. Abraham, "Designing for Concurrent Error Detection in VLSI: Application to a Microprogram Control Unit," IEEE J. Solid State Circuits, Vol. 22, No. 4, 1987, pp. 595-605.

Index Terms:
Fault tolerance, self-recovery, transient faults, VLSI design automation, high level synthesis.
Citation:
Alex Orailoglu, Ramesh Karri, "Automatic Synthesis of Self-Recovering VLSI Systems," IEEE Transactions on Computers, vol. 45, no. 2, pp. 131-142, Feb. 1996, doi:10.1109/12.485368
Usage of this product signifies your acceptance of the Terms of Use.