This Article 
 Bibliographic References 
 Add to: 
Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems
September 2000 (vol. 49 no. 9)
pp. 906-914

Abstract—Real-time systems are being increasingly used in several applications which are time-critical in nature. Fault tolerance is an essential requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that guarantees the timely recovery from multiple faults within hard real-time constraints in uniprocessor systems. Assuming earliest-deadline-first scheduling (EDF) for aperiodic preemptive tasks, we develop a necessary and sufficient feasibility-check algorithm for fault-tolerant scheduling with complexity $O(n^2 \cdot k)$, where $n$ is the number of tasks to be scheduled and $k$ is the maximum number of faults to be tolerated.

[1] S. Balaji, L. Jenkins, L.M. Patnaik, and P.S. Goel, Workload Redistribution for Fault Tolerance in a Hard Real-Time Distributed Computing System Proc. IEEE Fault Tolerance Computing Symp. (FTCS-19), pp. 366-373, 1989.
[2] A.A. Bertossi, L.V. Mancini, and F. Rossini, “Fault-Tolerant Rate-Monotonic First-Fit Scheduling in Hard-Real-Time Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 9, Sept. 1999.
[3] J.L. Crowley, P. Bobet, and M. Mesrabi, “Layered Control of a Binocular Camera Head,” Int'l J. Pattern Recognition and Artifical Intelligence, vol. 7, no. 1, pp. 109-122, Feb. 1993.
[4] A. Campbell, P. McDonald, and K. Ray, Single Event Upset Rates in Space IEEE Trans. Nuclear Science, vol. 39, no. 6, pp. 1828-1835, 1992.
[5] X. Castillo, S.R. McConnel, and D.P. Siewiorek, “Derivation and Caliberation of a Transient Error Reliability Model,” IEEE Trans. Computers, vol. 31, no. 7, pp. 658-671, July 1982.
[6] S. Ghosh, R. Melhem, and D. Mossé, “Fault-Tolerant Scheduling on a Hard Real-Time Multiprocessor System,” Proc. Int'l Parallel Processing Symp., Apr. 1994.
[7] S. Ghosh, R. Melhem, and D. Mosse, "Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 3, pp. 272-284, Mar. 1997.
[8] S. Ghosh, D. Mossé, and R. Melhem, Tolerant Rate-Monotonic Scheduling J. Real-Time Systems, vol. 15, no. 2, Sept. 1998.
[9] K.-H. Huang and J.A. Abraham, “Algorithm-Based Fault Tolerance for Matrix Operations,” IEEE Trans. Computers, vol. 33, pp. 518-528, 1984.
[10] J.J. Horning, H.C. Lauer, P.M. Melliar-Smith, and B. Randel, “A Program Structure for Error Detection and Recovery,” Lecture Notes in Computer Science, vol. 16, pp. 177-193, 1974.
[11] R.K. Iyer and D.J. Rossetti, "A Measurement-Based Model for Workload Dependence of CPU Errors," IEEE Trans. Computers, vol. 35, no. 6, pp. 511-519, June 1986.
[12] R. Iyer, D. Rossetti, and M. Hsueh, “Measurement and Modeling of Computing Reliability as Affected by System Activity,” ACM Trans. Computer Systems, vol. 4, pp. 214-237, Aug. 1986.
[13] H. Kopetz, Real-Time Systems Design Principles for Distributed Embedded Applications, Kluwer Academic, Boston, 1997.
[14] C. M. Krishna and K. G. Shin,“On scheduling tasks with a quick recovery from failure,”IEEE Trans. Comput., vol. C-35, no. 5, pp. 448–455, May 1986.
[15] C.M. Krishna and K. Shin, Real-Time Systems. McGraw-Hill, 1997.
[16] A. L. Liestman and R. H. Campbell,“A fault tolerant scheduling problem,”IEEE Trans. Software Eng., vol. SE-12, no. 11, pp. 1089–1095, Nov. 1986.
[17] J.H. Lala and R.E. Harper, "Architectural Principles for Safety-Critical Real-Time Applications," Proc. IEEE, vol. 82, no. 1, pp. 25-40, Jan. 1994.
[18] F. Liberato, S. Lauzac, R. Melhem, and D. Mossé, Fault-Tolerant Real-Time Global Scheduling on Multiprocessors Proc. 10th IEEE Euromicro Workshop Real-Time Systems, June 1999.
[19] S. Lauzac, R. Melhem, and D. Mosse, “An Efficient RMS Admission Control and Its Application to Multiprocessor Scheduling,” Proc. Int'l Parallel Processing Symp., pp. 511-518, 1998.
[20] L.D. Nguyen and A.M.K. Cheng, “An Imprecise Real-Time Image Magnification Algorithm,” Proc. Int'l Conf. Multimedia Systems, 1996.
[21] Y. Oh and S. Son, “An Algorithm for Real-Time Fault-Tolerant Scheduling in a Multiprocessor System,” Proc. Fourth Euromicro Workshop Real-Time Systems, June 1992.
[22] Y. Oh and S.H. Son, Enhancing Fault-Tolerance in Rate-Monotonic Scheduling J. Real-Time Systems, vol. 7, no. 3, pp. 315-329, Nov. 1994.
[23] Y. Oh and S.H. Son, Allocating Fixed-Priority Periodic Tasks on Multiprocessor Systems Real-Time Systems, vol. 9, pp. 207-239, 1995.
[24] M. Pandya and M. Malek, “Minimum Achievable Utilization for Fault-Tolerant Processing of Periodic Tasks,” IEEE Trans. Computers, vol. 47, no. 10, pp. 1102-1112, Oct. 1998.
[25] D.K. Pradhan, “Fault-Tolerant Computing: Theory and Techniques,” vol. II, pp. 492-496, chapter 6. Prentice Hall, 1986.
[26] S. Ramos-Thuel, “Enhancing Fault Tolerance of Real-Time Systems through Time Redundancy,” PhD thesis, Carnegie Mellon Univ., May 1993.
[27] S. Ramos-Thuel and J.K. Strosnider, “Scheduling Fault Recovery Operations for Time-Critical Applications,” Proc. Fourth IFIP Conf. Dependable Computing for Critical Applications, Jan. 1994.
[28] Y.M. Yeh and T.Y. Feng, “Algorithm Based Fault Tolerance for Matrix Inversion with Maximum Pivoting,” J. Parallel and Distributed Computing, vol. 14, pp. 373-389, 1992.

Index Terms:
Real-time scheduling, earliest-deadline first, fault-tolerant schedules, fault recovery.
Frank Liberato, Rami Melhem, Daniel Mossé, "Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems," IEEE Transactions on Computers, vol. 49, no. 9, pp. 906-914, Sept. 2000, doi:10.1109/12.869322
Usage of this product signifies your acceptance of the Terms of Use.