Issue No. 08 - Aug. (2017 vol. 66)
Gregory Levitin , Collaborative Autonomic Computing Laboratory, School of Computer Science, University of Electronic Science and Technology of China, Sichuan Sheng, China
Liudong Xing , Department of Electrical & Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA
Yuanshun Dai , Collaborative Autonomic Computing Laboratory, School of Computer Science, University of Electronic Science and Technology of China, Sichuan Sheng, China
Vinod M. Vokkarane , Department of Electrical & Computer Engineering, University of Massachusetts Lowell, Lowell, MA
This paper models 1-out-of-
N standby computing systems with a dynamic checkpointing policy. The system performs a real-time mission task that has to be accomplished within an allowed mission time. During the mission, to facilitate an effective failure recovery the system undergoes checkpointing procedures according to a policy that dynamically determines a checkpointing frequency based on the activated element and remaining work for completing the mission. System elements are heterogeneous; they can follow different, arbitrary types of time-to-failure distributions, have different performance and wait in different standby modes before their activation. A new numerical algorithm based on state space event transitions is first proposed to evaluate mission success probability of the real-time standby systems considered in this work. Additional new contributions are made by formulating and solving optimal dynamic checkpointing policy problems, as well as an integrated optimization problem that finds the optimal combination of checkpointing policy and element activation sequence maximizing mission success probability. Advantages of using the dynamic checkpointing policy over fixed even checkpoints are demonstrated through examples. Examples and results are also provided to illustrate effects of different mission and element parameters on mission success probability as well as on the optimal dynamic checkpointing policy.
Checkpointing, Real-time systems, Computational modeling, Program processors, Computers, Reliability, Electronic mail
G. Levitin, L. Xing, Y. Dai and V. M. Vokkarane, "Dynamic Checkpointing Policy in Heterogeneous Real-Time Standby Systems," in IEEE Transactions on Computers, vol. 66, no. 8, pp. 1449-1456, 2017.