Issue No. 06 - June (2011 vol. 60)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2011.68
Xiaomin Zhu , Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
Xiao Qin , Dept. of Comput. Sci. & Software Eng., Auburn Univ., Auburn, AL, USA
Meikang Qiu , Dept. of Electr. & Comput. Eng., Univ. of Kentucky, Lexington, KY, USA
Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one node's permanent failures at one time instant for real-time tasks with QoS needs on heterogeneous clusters. In order to improve system flexibility, reliability, schedulability, and resource utilization, QAFT strives to either advance the start time of primary copies and delay the start time of backup copies in order to help backup copies adopt the passive execution scheme, or to decrease the simultaneous execution time of the primary and backup copies of a task as much as possible to improve resource utilization. QAFT is capable of adaptively adjusting the QoS levels of tasks and the execution schemes of backup copies to attain high system flexibility. Furthermore, we employ the overlapping technology of backup copies. The latest start time of backup copies and their constraints are analyzed and discussed. We conduct extensive experiments to compare our QAFT with two existing schemes-NOQAFT and DYFARS. Experimental results show that QAFT significantly improves the scheduling quality of NOQAFT and DYFARS.
scheduling, fault tolerant computing, quality of service, passive execution scheme, QoS-aware fault-tolerant scheduling, quality of service, realtime task scheduling, heterogeneous clusters, parallel systems, distributed systems, Real time systems, Fault tolerance, Fault tolerant systems, Quality of service, Scheduling algorithm, Heuristic algorithms, heuristic., Heterogeneous clusters, real-time, scheduling, fault tolerance, quality of service (QoS)
Xiaomin Zhu, Xiao Qin, Meikang Qiu, "QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters", IEEE Transactions on Computers, vol. 60, no. , pp. 800-812, June 2011, doi:10.1109/TC.2011.68