This Article 
 Bibliographic References 
 Add to: 
Determining Redundancy Levels for Fault Tolerant Real-Time Systems
February 1995 (vol. 44 no. 2)
pp. 292-301

Abstract—Many real-time systems have both performance requirements and reliability requirements. Performance is usually measured in terms of the value in completing tasks on time. Reliability is evaluated by hardware and software failure models. In many situations, there are trade-offs between task performance and task reliability. Thus, a mathematical assessment of performance-reliability trade-offs is necessary to evaluate the performance of real-time fault-tolerance systems.

Assuming that the reliability of task execution is achieved through task replication, we present an approach that mathematically determines the replication factor for tasks. Our approach is novel in that it is a task schedule based analysis rather than a state based analysis as found in other models. Because we use a task schedule based analysis, we can provide a fast method to determine optimal redundancy levels, we are not limited to hardware reliability given by constant failure rate functions as in most other models, and we hypothesize that we can more naturally integrate with on-line real-time scheduling than when state based techniques are used. In this work, the goal is to maximize the total performance index, which is a performance-related reliability measurement. We present a technique based on a continuous task model and show how it very closely approximates discrete models and tasks with varying characteristics.

Index Terms—Real-time systems, reliability, degradable systems, fault tolerance, functional variation, performability.

[1] M. D. Beaudry,“Performance-related reliability measures for computing systems,”IEEE Trans. Comput., pp. 540–547, June 1978.
[2] O. Bolza,Lectures on the Calculus of Variations. New York: Chelsea, 1960.
[3] S. R. Calabro,Reliability Principles and Practices. New York: McGraw-Hill, 1962.
[4] D.-T. Peng,“Performance bounds in list scheduling of redundant tasks on multi-processors,”inProc. FTCS-22, 1992, pp. 196–203.
[5] M. R. Garey, R. L. Graham, D. S. Johnson, and A. C.-C. Yao,“Resource constrained scheduling as generalized bin packing,”J. Combinatorial Theory Ser. A, vol. 21, pp. 257–298, 1976.
[6] C. M. Krishna and K. G. Shin,“On scheduling tasks with a quick recovery from failure,”IEEE Trans. Comput., vol. C-35, no. 5, pp. 448–455, May 1986.
[7] Y.-H. Lee and K. G. Shin,“Optimal reconfiguration strategy for a degradable multimodule computing system,”J. ACM, pp. 326–348, Apr. 1987.
[8] A. L. Liestman and R. H. Campbell,“A fault tolerant scheduling problem,”IEEE Trans. Software Eng., vol. SE-12, no. 11, pp. 1089–1095, Nov. 1986.
[9] J. Mathews,Numerical Methods for Mathematics, Science, and Engineering. Englewood Cliffs, NJ: Prentice-Hall, 1992.
[10] J. F. Meyer,“On evaluating performability of degradable computing systems,”IEEE Trans. Comput., vol. C-29, pp. 720–731, 1980.
[11] J. Muppala, S. Woolet, and K. Trivedi,“Real-time systems performance in the presence of failures,”IEEE Computer, vol. 24, no. 5, May 1991.
[12] W. Zhao and K. Ramamritham,“Simple and integrated heuristic algorithms for scheduling tasks with time and resource constraints,”J. Syst. and Software, vol. 7, pp. 195–207, 1987.

Fuxing Wang, Krithi Ramamritham, John A. Stankovic, "Determining Redundancy Levels for Fault Tolerant Real-Time Systems," IEEE Transactions on Computers, vol. 44, no. 2, pp. 292-301, Feb. 1995, doi:10.1109/12.364540
Usage of this product signifies your acceptance of the Terms of Use.