Issue No.03 - May/June (2011 vol.8)
Refik Samet , Ankara University, Ankara, Turkey
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TDSC.2010.12
This paper proposes the design of specialized hardware, called Recovery Device, for a dual-redundant computer system that operates in real-time. Recovery Device executes all fault-tolerant services including fault detection, fault type determination, fault localization, recovery of system after temporary (transient) fault, and reconfiguration of system after permanent fault. The paper also proposes the algorithms for determination of fault type (whether the fault is temporary or permanent) and localization of faulty computer without using self-testing techniques and diagnosis routines. Determination of fault type allows us to eliminate only the computer with a permanent fault. In other words, the determination of fault type prevents the elimination of nonfaulty computer because of short temporary fault. On the other hand, localization of faulty computer without using self-testing techniques and diagnosis routines shortens the recovery point time period and reduces the probability that a fault will occur during the execution of fault-tolerant procedure. This is very important for real-time fault-tolerant systems. These contributions bring both an increase in system performance and an increase in the degree of system reliability.
Dual-redundant computer system, fault-tolerant procedure, hardware implementation, real-time, recovery device, recovery point, temporary and permanent faults.
Refik Samet, "Recovery Device for Real-Time Dual-Redundant Computer Systems", IEEE Transactions on Dependable and Secure Computing, vol.8, no. 3, pp. 391-403, May/June 2011, doi:10.1109/TDSC.2010.12