The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2011 vol.60)
pp: 800-812
Xiaomin Zhu , Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
Xiao Qin , Dept. of Comput. Sci. & Software Eng., Auburn Univ., Auburn, AL, USA
Meikang Qiu , Dept. of Electr. & Comput. Eng., Univ. of Kentucky, Lexington, KY, USA
Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one node's permanent failures at one time instant for real-time tasks with QoS needs on heterogeneous clusters. In order to improve system flexibility, reliability, schedulability, and resource utilization, QAFT strives to either advance the start time of primary copies and delay the start time of backup copies in order to help backup copies adopt the passive execution scheme, or to decrease the simultaneous execution time of the primary and backup copies of a task as much as possible to improve resource utilization. QAFT is capable of adaptively adjusting the QoS levels of tasks and the execution schemes of backup copies to attain high system flexibility. Furthermore, we employ the overlapping technology of backup copies. The latest start time of backup copies and their constraints are analyzed and discussed. We conduct extensive experiments to compare our QAFT with two existing schemes-NOQAFT and DYFARS. Experimental results show that QAFT significantly improves the scheduling quality of NOQAFT and DYFARS.
scheduling, fault tolerant computing, quality of service, passive execution scheme, QoS-aware fault-tolerant scheduling, quality of service, realtime task scheduling, heterogeneous clusters, parallel systems, distributed systems, Real time systems, Fault tolerance, Fault tolerant systems, Quality of service, Scheduling algorithm, Heuristic algorithms, heuristic., Heterogeneous clusters, real-time, scheduling, fault tolerance, quality of service (QoS)
Xiaomin Zhu, Xiao Qin, Meikang Qiu, "QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters", IEEE Transactions on Computers, vol.60, no. 6, pp. 800-812, June 2011, doi:10.1109/TC.2011.68
[1] K. Hwang, and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1998.
[2] A. Goller and F. Leberl, "Radar Image Processing with Clusters of Computers," IEEE Aerospace and Electronics Systems Magazine, vol. 24, no. 1, pp. 18-22, Jan. 2009.
[3] H.Y. Chang, K.C. Huang, C.Y. Shen, S.C. Tcheng, and C.Y. Chou, "Parallel Computation of a Weather Model in a Cluster Environment," J. Computer-Aided Civil and Infrastructure Eng., vol. 16, no. 5, pp. 365-373, Sept. 2001.
[4] T. Xie and X. Qin, "Scheduling Security-Critical Real-Time Applications on Clusters," IEEE Trans. Computers, vol. 55, no. 7, pp. 864-879, July 2006.
[5] C.M. Krishna and K.G. Shin, Real-Time Systems. McGraw-Hill, 2001.
[6] T.F. Atdelzater, E.M. Atkins, and K.G. Shin, "QoS Negotiation in Real-Time Systems and Its Application to Automated Flight Control," IEEE Trans. Computers, vol. 49, no. 11, pp. 1170-1183, Nov. 2000.
[7] S. Liden, "The Evolution of Flight Management Systems," Proc. IEEE/AIAA 13th Digital Avionics Systems Conf. (DASC '94), pp. 157-169, Oct. 1994.
[8] R. Pyndiah, A. Glavieux, A. Picart, and S. Jacq, "Near Optimal Decoding of Product Codes," Proc. IEEE Global Telecomm. Conf. (GLOBECOM '94), pp. 339-343, Nov./Dec. 1994.
[9] Z. Chi, L. Song, and K.K. Parhi, "A Study on the Performance, Complexity Tradeoffs of Block Turbo Decoder Design," Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '01), vol. 4, pp. 65-68, May 2001.
[10] P. Adde and R. Pyndiah, "Recent Simplifications and Improvements in Block Turbo Codes," Proc. Second Int'l Symp. Trubo Codes and Related Topics (ISTCRT '00), pp. 133-136, Sept. 2000.
[11] X. Qin and H. Jiang, "A Novel Fault-Tolerant Scheduling Algorithm for Precedence Constrained Tasks in Real-Time Heterogeneous Systems," J. Parallel Computing, vol. 32, no. 5, pp. 331-356, Aug. 2006.
[12] A. Amin, R.A. Ammar, and S.S. Gokhale, "An Efficient Method to Schedule Tandem of Real-Time Tasks in Cluster Computing with Possible Processor Failures," Proc. Eighth IEEE Int'l Symp. Computers and Comm. (ISCC '03), pp. 1207-1212, June 2003.
[13] F. Harada, T. Ushio, and Y. Nakamoto, "Adaptive Resource Allocation Control for Fair QoS Management," IEEE Trans. Computers, vol. 56, no. 3, pp. 344-357, Mar. 2007.
[14] L. He, S.A. Jarvis, and D.P. Spooner, "Dynamic Scheduling of Parallel Jobs with QoS Demands in Multiclusters and Grids," Proc. Fifth IEEE/ACM Int'l Workshop Grid Computing (Grid '04), pp. 402-409, Nov. 2004.
[15] A. Doğan and F. Özgüner, "Scheduling of a Meta-Task with QoS Requirements in Heterogeneous Computing Systems," J. Parallel and Distributed Computing, vol. 66, no. 2, pp. 181-196, Feb. 2006.
[16] M. Weber, "Operating-System Enhancements for a Fault-Tolerant Dual-Processor Structure for the Control of an Industrial Process," Software: Practice and Experience, vol. 17, no. 5, pp. 345-350, May 1987.
[17] K.H. Kim and A. Damm, "Fault-Tolerance Approaches in Two Experimental Real-Time Systems," Proc. Seventh IEEE Workshop Real-Time Operating Systems and Software (RTOSS '90), pp. 94-98, May 1990.
[18] G. Manimaran and C.S.R. Murthy, "A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 11, pp. 1137-1152, Nov. 1998.
[19] R. Jejurikar and R. Gupta, "Energy Aware Non-Preemptive Scheduling for Hard Real-Time Systems," Proc. 17th Euromicro Conf. Real-Time Systems (ECRTS '05), pp. 21-30, July 2005.
[20] W. Li, K. Kavi, and R. Akl, "A Non-Preemptive Scheduling Algorithm for Soft Real-Time Systems," Computers and Electrical Eng., vol. 33, no. 1, pp. 12-29, Jan. 2007.
[21] S. Dolev and A. Keizelman, "Non-Preemptive Real-Time Scheduling of Multimedia Tasks," J. Real-Time Systems, vol. 17, no. 1, pp. 23-39, July 1999.
[22] J.D. Ullman, "NP-Complete Scheduling Problems," J. Computer and System Sciences, vol. 10, no. 3, pp. 384-393, Oct. 1975.
[23] C.C. Han, K.G. Shin, and J. Wu, "A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults," IEEE Trans. Computers, vol. 52, no. 3, pp. 362-372, Mar. 2003.
[24] O. González, H. Shrikumar, J.A. Stankovic, and K. Ramamritham, "Adaptive Fault Tolerance and Graceful Degradation under Dynamic Hard Real-Time Scheduling," Proc. 18th IEEE Real-Time Systems Symp. (RTSS '97), pp. 79-89, Dec. 1997.
[25] M. Naedele, "Fault-Tolerant Real-Time Scheduling under Execution Time Constraints," Proc. Sixth Int'l Conf. Real-Time Computing Systems and Applications (RTCSA '99), pp. 392-395, Dec. 1999.
[26] Q. Zheng, B. Veeravalli, and C.K. Tham, "On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs," IEEE Trans. Computers, vol. 58, no. 3, pp. 380-393, Mar. 2009.
[27] Y.S. Hong and H.W. Goo, "A Fault-Tolerant Scheduling Scheme for Hybrid Tasks in Distributed Real-Time Systems," Proc. Third IEEE Workshop Software Technologies for Future Embedded and Ubiquitous Systems (SEUS '05), pp. 3-6, May 2005.
[28] P. Mejia-Alvarez and D. Mosse, "A Responsiveness Approach for Scheduling Fault Recovery in Real-Time Systems," Proc. Fifth IEEE Symp. Real-Time Technology and Applications (RTAS '99), pp. 4-13, June 1999.
[29] S. Ghosh, R. Melhem, and D. Mossé, "Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 3, pp. 272-284, Mar. 1997.
[30] R. Al-Omari, A.K. Somani, and G. Manimaran, "Efficent Overloading Technique for Primary-Backup Scheduling in Real-Time Systems," J. Parallel and Distributed Computing, vol. 64, no. 5, pp. 629-648, May 2004.
[31] T. Tsuchiya, Y. Kakuda, and T. Kikuno, "A New Fault-Tolerant Scheduling Technique for Real-Time Multiprocessor Systems," Proc. Second Int'l Workshop Real-Time Computing Systems and Applications (RTCSA '95), pp. 197-202, Oct. 1995.
[32] C.H. Yang, G. Deconinec, and W.H. Gui, "Fault-Tolerant Scheduling for Real-Time Embedded Control Systems," J. Computer Science and Technology, vol. 19, no. 2, pp. 191-202, Feb. 2004.
[33] R. Al-Omari, A.K. Somani, and G. Manimaran, "An Adaptive Scheme for Fault-Tolerant Scheduling of Soft Real-Time Tasks in Multiprocessor Systems," J. Parallel and Distributed Computing, vol. 65, no. 5, pp. 595-608, May 2005.
[34] W. Luo, J. Li, F. Yang, G. Tu, L. Pang, and L. Shu, "DYFARS: Boosting Reliability in Fault-Tolerant Heterogeneous Distributed Systems through Dynamic Scheduling," Proc. Eighth ACIS Int'l Conf. Software Eng., Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD '07), pp. 640-645, Aug. 2007.
[35] T.D. Braun, H.J. Siegel, and N. Beck, et al., "A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems," , J. Parallel and Distributed Computing, vol. 61, no. 6, pp. 810-837, June 2001.
[36] S. Srinivasan and N.K. Jha, "Safety and Reliability Driven Task Allocation in Distributed Systems," IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 3, pp. 238-251, Mar. 1999.
[37] X. Qin and H. Jiang, "A Dynamic and Reliability-Driven Scheduling Algorithm for Parallel Real-Time Jobs Executing on Heterogeneous Clusters," J. Parallel and Distributed Computing, vol. 65, no. 8, pp. 885-900, Aug. 2005.
[38] H.J. Siegel, H.G. Dietz, and J.K. Antonio, "Software Support for Heterogeneous Computing," The Computer Science and Eng. Handbook, CRC Press, 1997.
[39] A.A. Khokhar, V.K. Prasanna, M.E. Shaaban, and C.L. Wang, "Heterogeneous Computing: Challenges and Opportunities," Computer, vol. 26, no. 6, pp. 18-27, June 1993.
[40] M. Qiu and E.H.-M. Sha, "Cost Minimization while Satisfying Hard/Soft Timing Constraints for Heterogeneous Embedded Systems," ACM Trans. Design Automation of Electronic Systems, vol. 14, no. 2, pp. 1-30, Mar. 2009.
[41] X. Zhu and P. Lu, "Study of Scheduling for Processing Real-Time Communication Signals on Heterogeneous Clusters," Proc. Ninth Int'l Symp. Parallel Architectures, Algorithms, and Networks (I-SPAN '08), pp. 121-126, May 2008.
[42] X. Zhu and P. Lu, "Multi-Dimensional Scheduling for Real-Time Tasks on Heterogeneous Clusters," J. Computer Science and Technology, vol. 24, no. 3, pp. 434-446, Mar. 2009.
[43] X. Zhu and P. Lu, "A Two-Phase Scheduling Strategy for Real-Time Applications with Security Requirements on Heterogeneous Clusters," Computers and Electrical Eng., vol. 35, pp. 980-993, Nov. 2009.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool