Issue No. 11 - November (1996 vol. 45)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.544478
<p><b>Abstract</b>—An instruction-retry policy is proposed to enhance the fault-tolerance of triple modular redundant (TMR) controller computers by adding time redundancy to them. A <it>TMR failure</it> is said to occur if a TMR system fails to establish a majority among its modules' outputs due to multiple faulty modules or a faulty voter. Either multiple consecutive TMR failures the active period of which exceeds a certain time limit or the exhaustion of spares as a result of frequent system reconfigurations may result in failure to meet the timing constraints of one or more tasks, called the <it>dynamic failure</it>, during a given mission. An optimal instruction-retry period is derived by minimizing the probability of dynamic failure upon detection of either a masked (by the TMR) error or a TMR failure. We also derive the minimum number of spares needed to keep below the pre-specified level the probability of dynamic failure for a given mission by using the derived optimal retry period.</p>
Real-time control systems, controller computer, internal and external faults, common-cause faults, TMR failures and masked errors, retry, reconfiguration, dynamic failure, hard deadlines.
H. Kim and K. G. Shin, "Design and Analysis of an Optimal Instruction-Retry Policy for TMR Controller Computers," in IEEE Transactions on Computers, vol. 45, no. , pp. 1217-1225, 1996.