Issue No.08 - August (2003 vol.29)
Rami Melhem , IEEE
Daniel Moss? , IEEE Computer Society
<p><b>Abstract</b>—Real-time systems (RTS) are those whose correctness depends on satisfying the required functional <it>as well as</it> the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. This paper presents a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, the paper describes the implementation of the modified admission control algorithm, the nonpreemptive scheduler, and a recovery mechanism in the FT-RT-Mach operating system.</p>
Fault tolerance, operating system, real-time, scheduling, transient faults.
Rami Melhem, Daniel Moss?, "A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation", IEEE Transactions on Software Engineering, vol.29, no. 8, pp. 752-767, August 2003, doi:10.1109/TSE.2003.1223648