Issue No. 04 - Oct.-Dec. (2018 vol. 6)
Jialei Liu , Beijing, China
Shangguang Wang , Beijing, China
Ao Zhou , Beijing, China
Sathish A. P. Kumar , Conway, SC
Fangchun Yang , Beijing, China
Rajkumar Buyya , Melbourne, Vic., Australia
The large-scale utilization of cloud computing services for hosting industrial/enterprise applications has led to the emergence of cloud service reliability as an important issue for both cloud service providers and users. To enhance cloud service reliability, two types of fault tolerance schemes, reactive and proactive, have been proposed. Existing schemes rarely consider the problem of coordination among multiple virtual machines (VMs) that jointly complete a parallel application. Without VM coordination, the parallel application execution results will be incorrect. To overcome this problem, we first propose an initial virtual cluster allocation algorithm according to the VM characteristics to reduce the total network resource consumption and total energy consumption in the data center. Then, we model CPU temperature to anticipate a deteriorating physical machine (PM). We migrate VMs from a detected deteriorating PM to some optimal PMs. Finally, the selection of the optimal target PMs is modeled as an optimization problem that is solved using an improved particle swarm optimization algorithm. We evaluate our approach against five related approaches in terms of the overall transmission overhead, overall network resource consumption, and total execution time while executing a set of parallel applications. Experimental results demonstrate the efficiency and effectiveness of our approach.
Cloud computing, Checkpointing, Fault tolerant systems, Redundancy, Monitoring
J. Liu, S. Wang, A. Zhou, S. A. Kumar, F. Yang and R. Buyya, "Using Proactive Fault-Tolerance Approach to Enhance Cloud Service Reliability," in IEEE Transactions on Cloud Computing, vol. 6, no. 4, pp. 1191-1202, 2018.