This Article 
 Bibliographic References 
 Add to: 
Sequencing Tasks to Minimize the Effects of Near-Coincident Faults in TMR Controller Computers
November 1996 (vol. 45 no. 11)
pp. 1331-1337

Abstract—Although Triple Modular Redundancy (TMR) has been widely used to mask the effects of a single faulty module, it cannot tolerate coincident faults in multiple modules caused by a common source, such as an environmental disruption or malfunction of a shared component. We propose a method to eliminate or alleviate the effects of (near) coincident faults by sequencing tasks on different modules in a TMR system. Specifically, we develop an effective sequencing of tasks to simply place an "optimal" distance (in the sense of minimizing the mean number of faulty tasks due to TMR failures) between the copies of a task to be executed on different modules. Several examples are presented, showing significant improvements in reducing TMR failures with the proposed task sequencing.

[1] J.A. Abraham and D.P. Siewiorek, "An Algorithm for the Accurate Reliability Evaluation of Triple Modular Redundancy Networks," IEEE Trans. Computers, vol. 23, no. 7, pp. 682-692, July 1974.
[2] A. Avizienis and G.C. Gilley, "The STAR (Self-Testing and Repairing) Computer: An Investigation of Theory and Practice of Fault-Tolerant Computer Design," IEEE Trans. Computers, vol. 20, no. 11, pp. 1,312-1,321, Nov. 1971.
[3] A.L. Hopkins Jr., T.B. Smith III, and J.H. Lala, "FTMP—A Highly Fault-Tolerant Multiprocessor for Aircraft," Proc. IEEE, vol. 66, no. 10, pp. 1,221-1,239, Oct. 1978.
[4] M. Kameyama and T. Higuchi, "Design of Dependent-Failure-Tolerant Microcomputer Systems Using Triple-Modular Redundancy," IEEE Trans. Computers, vol. 29, no. 2, pp. 202-205, Feb. 1980.
[5] H. Kim and K.G. Shin, "Modeling Externally-Induced Faults in Controller Computers," Proc. 13th IEEE/AIAA Digital Avionics Systems Conf., pp. 402-407,Phoenix, Ariz., Oct. 1994.
[6] H. Kim and K.G. Shin, "On the Maximum Feedback Delay in a Linear/Nonlinear Control System with Input Disturbances Caused by Controller-Computer Failures," IEEE Trans. Control Systems Technology, vol. 2, no. 2, pp. 110-122, June 1994.
[7] H. Kim and K.G. Shin, "Design and Analysis of an Optimal Instruction-Retry Policy for TMR Controller Computer," IEEE Trans. Computers, vol. 45, no. 11, pp. 1,217-1,225, Nov. 1996.
[8] H. Kim, "Design and Evaluation of Real-Time Fault-Tolerant Control Systems," PhD thesis, Univ. of Michigan, Ann Arbor, July 1994.
[9] V.B. Pradsad, "Fault-Tolerant Digital Systems," IEEE, pp. 17-21, Feb. 1989.
[10] K.G. Shin and H. Kim, "A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods," IEEE Trans. Computers, vol. 43, no. 10, pp. 1,151-1,162, Oct. 1994.
[11] D.P. Siewiorek, V. Kini, and H. Mashburn, "A Case Study of C.mmp, Cm*, and C.vmp: Part I—Experiences with Fault Tolerance in Multiprocessor Systems," Proc. IEEE, vol. 66, no. 10, pp. 1,178-1,199, Oct. 1978.
[12] J.F. Wakerly, "Transient Failures in Triple Modular Redundancy Systems with Sequential Modules," IEEE Trans. Computers, vol. 24, no. 5, pp. 570-573, May 1975.
[13] X.-Y. Zhuo and S.-L. Li, "A New Design Method of Voter in Fault-Tolerant Redundancy Multiple-Module Multi-Microcomputer System," Digest of Papers, FTCS-13, pp. 472-475, June 1983.

Index Terms:
TMR failure; common-cause and independent faults; conventional, random, and effective sequencing of tasks; Task Interval (TI), task distance.
Hagbae Kim, Kang G. Shin, "Sequencing Tasks to Minimize the Effects of Near-Coincident Faults in TMR Controller Computers," IEEE Transactions on Computers, vol. 45, no. 11, pp. 1331-1337, Nov. 1996, doi:10.1109/12.544492
Usage of this product signifies your acceptance of the Terms of Use.