This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis
November 1998 (vol. 9 no. 11)
pp. 1137-1152

Abstract—Many time-critical applications require dynamic scheduling with predictable performance. Tasks corresponding to these applications have deadlines to be met despite the presence of faults. In this paper, we propose an algorithm to dynamically schedule arriving real-time tasks with resource and fault-tolerant requirements on to multiprocessor systems. The tasks are assumed to be nonpreemptable and each task has two copies (versions) which are mutually excluded in space, as well as in time in the schedule, to handle permanent processor failures and to obtain better performance, respectively. Our algorithm can tolerate more than one fault at a time, and employs performance improving techniques such as 1) distance concept which decides the relative position of the two copies of a task in the task queue, 2) flexible backup overloading, which introduces a trade-off between degree of fault tolerance and performance, and 3) resource reclaiming, which reclaims resources both from deallocated backups and early completing tasks. We quantify, through simulation studies, the effectiveness of each of these techniques in improving the guarantee ratio, which is defined as the percentage of total tasks, arrived in the system, whose deadlines are met. Also, we compare through simulation studies the performance our algorithm with a best known algorithm for the problem, and show analytically the importance of distance parameter in fault-tolerant dynamic scheduling in multiprocessor real-time systems.

[1] M.L. Dertouzos and A.K. Mok, Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks IEEE Trans. Software Eng., vol. 15, no. 12, pp. 1497-1505, 1989.
[2] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[3] S. Ghosh, R. Melhem, and D. Mosse, "Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 3, pp. 272-284, Mar. 1997.
[4] R.L. Graham, "Bounds on Multiprocessing Timing Anomalies," SIAM J. Applied Math., vol. 17, no. 2, Mar. 1969.
[5] K. Kim and J. Yoon, "Approaches to Implementation of Reparable Distributed Recovery Block Scheme," Proc. IEEE Fault-Tolerant Computing Symp., pp. 50-55, 1988.
[6] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabi, C. Senft, and R. Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The MARS Approach," IEEE Micro, pp. 25-58, Feb. 1989.
[7] C. M. Krishna and K. G. Shin,“On scheduling tasks with a quick recovery from failure,”IEEE Trans. Comput., vol. C-35, no. 5, pp. 448–455, May 1986.
[8] C.M. Krishna and K.G. Shin, Real-Time Systems. McGraw-Hill Int'l, 1997.
[9] J.H. Lala and R.E. Harper, "Architectural Principles for Safety-Critical Real-Time Applications," Proc. IEEE, vol. 82, no. 1, pp. 25-40, Jan. 1994.
[10] A. L. Liestman and R. H. Campbell,“A fault tolerant scheduling problem,”IEEE Trans. Software Eng., vol. SE-12, no. 11, pp. 1089–1095, Nov. 1986.
[11] J.W.S. Liu, W. Shih, K.J. Lin, R. Bettati, and J. Chung, “Imprecise Computations,” IEEE Proc., Jan. 1994.
[12] L.V. Mancini, "Modular Redundancy in a Message Passing System," IEEE Trans. Software Eng., vol. 12, no. 1, pp. 79-86, Jan. 1986.
[13] G. Manimaran and C. Siva Ram Murthy, "An Efficient Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 3, pp. 312-319, Mar. 1998.
[14] G. Manimaran and C. Siva Ram Murthy, "A New Study for Fault-Tolerant Real-Time Dynamic Scheduling Algorithms," Proc. IEEE Int'l Conf. High Performance Computing, Dec. 1996.
[15] G. Manimaran, C.S.R. Murthy, M. Vijay, and K. Ramamritham, "New Algorithms for Resource Reclaiming from Precedence Constrained Tasks in Multiprocessor Real-Time Systems," to appear J. Parallel and Distributed Computing, vol. 44, no. 2, pp. 123-132, Aug. 1997.
[16] J.J. Molini, S.K. Maimon, and P.H. Watson, "Real-Time System Scenarios," Proc. IEEE Real-Time Systems Symp., pp. 214-225, 1990.
[17] D. Mossé, R. Melhem, and S. Ghosh, Analysis of a Fault-Tolerant Multiprocessor Scheduling Algorithm Proc. 24th Int'l Symp. Fault-Tolerant Computing, June 1994.
[18] E. Nett, H. Streich, P. Bizzarri, A. Bondavalli, and F. Tarini, "Adaptive Software Fault Tolerance Policies With Dynamic Real-Time Guarantees," Proc. WORDS '96, Feb. 1996.
[19] Y. Oh and S. Son, "Multiprocessor Support for Real-Time Fault-Tolerant Scheduling," Proc. IEEE Workshop Architectural Aspects of Real-Time Systems, Dec. 1991.
[20] J.H. Purtilo and P. Jalote, "An Environment for Developing Fault-Tolerant Software," IEEE Trans. Software Eng., vol. 17, no. 2, pp. 153-159, Feb. 1991.
[21] K. Ramamritham, J. Stankovic, and P. Shiah, “Efficient Scheduling Algorithms for Real-Time Multiprocessor Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, Apr. 1990.
[22] K. Ramamritham and J.A. Stankovic, “Scheduling Algorithms and Operating System Support for Real Time Systems,” Proc. IEEE, vol. 82, no. 1, Jan. 1994.
[23] P. Ramanathan, “Graceful Degradation in Real-Time Control Applications Using (m, k)-Firm Guarantee,” Proc. IEEE 27th Int'l Symp. Fault-Tolerant Computing, June 1997.
[24] C. Shen, K. Ramamritham, and J.A. Stankovic, "Resource Reclaiming in Multiprocessor Real-Time Systems," IEEE Trans. Parallel and Distributed Systems, Vol. 4, No. 4, Apr. 1993, pp. 382-397.
[25] K.G. Shin and P. Ramanathan, "Real-Time Computing: A New Discipline of Computer Science and Engineering," Proc. IEEE, vol. 82, no. 1, Jan. 1994.
[26] A.K. Somani and N.H. Vaidya, "Understanding Fault-Tolerance and Reliability," Computer, vol. 30, no. 4, pp. 45-50, Apr. 1997.
[27] J. Stankovic and K. Ramamritham, “The Spring Kernel: A New Paradigm for Real-Time Operating Systems,” ACM Operating Systems Review, vol. 23, no. 3, pp. 54–71, July 1989.
[28] H. Streich, "TaskPair-Scheduling: An Approach for Dynamic Real-Time Systems," Int'l J. Mini and Microcomputers, vol. 17, no. 2, pp. 77-83, Jan. 1995.
[29] S. Tridandapani, A. Somani, and U. Sandadi, "Low Overhead Multiprocessor Allocation Strategies Exploiting System Spare Capacity for Fault Detection and Location," IEEE Trans. Computers, vol. 44, no. 7, pp. 865-877, July, 1995.
[30] T. Tsuchiya, Y. Kakuda, and T. Kikuno, "Fault-Tolerant Scheduling Algorithm for Distributed Real-Time Systems," Proc. Workshop Parallel and Distributed Real-time Systems, 1995.
[31] F. Wang, K. Ramamritham, and J.A. Stankovic, "Determining Redundancy Levels for Fault Tolerant Real-Time Systems," IEEE Trans. Computers, vol. 44, no. 2, pp. 292-301, Feb. 1995.
[32] J. Xu, “Multiprocessor Scheduling of Processes with Release Times, Deadlines, Precedence, and Exclusion Relations,” IEEE Trans. Software Eng., vol. 19, no. 2, pp. 139-154, Feb. 1993.
[33] W. Zhao, K. Ramamritham, and J.A. Stankovic, “Scheduling Tasks with Resource Requirements in Hard Real Time Systems,” IEEE Trans. Software Eng., vol. 13, no. 5, pp. 564-577, May 1987.

Index Terms:
Real-time system, dynamic scheduling, fault tolerance, resource reclaiming, run-time anomaly, safety critical application.
Citation:
G. Manimaran, C. Siva Ram Murthy, "A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 11, pp. 1137-1152, Nov. 1998, doi:10.1109/71.735960
Usage of this product signifies your acceptance of the Terms of Use.