Long Wang , Long Wang is with IBM T. J. Watson Research Center 1101 Kitchawan Rd, Yorktown Heights, NY 10598.(email:firstname.lastname@example.org)
Checkpointing and rollback techniques enhance reliability and availability of virtual machines and their hosted IT services. This paper proposes VM-μCheckpoint, a lightweight pure-software mechanism for high-frequency checkpointing and rapid recovery for VMs. Compared with existing techniques of VM checkpointing, VM-μCheckpoint tries to minimize checkpoint overhead and speed up recovery by means of copy-on-write, dirty-page prediction and in-place recovery, as well as saving incremental checkpoints in volatile memory. Moreover, VM- μCheckpoint deals with the issue that latency in error detection potentially results in corrupted checkpoints, particularly when checkpointing frequency is high. We also constructed Markov models to study the availability improvements provided by VM-μCheckpoint (from 99% to 99.98% on reasonably reliable hypervisors). We designed and implemented VM-μCheckpoint in the Xen VMM. The evaluation results demonstrate that VM-μCheckpoint incurs an average of 6.3% overhead (in terms of program execution time) for 50ms checkpoint intervals when executing the SPEC CINT 2006 benchmark. Error injection experiments demonstrate that VM-μCheckpoint, combined with error detection techniques in RMK, provides high coverage of recovery.
Long Wang, Zbigniew Kalbarczyk, Ravishankar Iyer, Arun K. Iyengar, "VM-μCheckpoint: Design, Modeling, and Assessment of Lightweight In-Memory VM Checkpointing", IEEE Transactions on Dependable and Secure Computing, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TDSC.2014.2327967