This Article 
 Bibliographic References 
 Add to: 
Fast Software Rejuvenation of Virtual Machine Monitors
November/December 2011 (vol. 8 no. 6)
pp. 839-851
Kenichi Kourai, Kyushu Institute of Technology, Fukuoka
Shigeru Chiba, Tokyo Institute of Technology, Tokyo
As server consolidation using virtual machines (VMs) is carried out, software aging of virtual machine monitors (VMMs) is becoming critical. Since a VMM is fundamental software for running VMs, its performance degradation or crash failure affects all VMs running on top of it. To counteract such software aging, a proactive technique called software rejuvenation has been proposed. A simple example of rejuvenation is to reboot a VMM. However, simply rebooting a VMM is undesirable because that needs rebooting operating systems on all VMs. In this paper, we propose a new technique for fast rejuvenation of VMMs called the warm-VM reboot. The warm-VM reboot enables efficiently rebooting only a VMM by suspending and resuming VMs without saving the memory images to persistent storage. To achieve this, we have developed two mechanisms: on-memory suspend/resume of VMs and quick reload of a VMM. Compared with a normal reboot, the warm-VM reboot reduced the downtime by 74 percent at maximum. It also prevented the performance degradation due to cache misses after the reboot, which was 52 percent in case of a normal reboot. In a cluster environment, the warm-VM reboot achieved higher total throughput than the system using VM migration and a normal reboot.

[1] Y. Huang, C. Kintala, N. Kolettis, and N. Fulton, “Software Rejuvenation: Analysis, Module and Applications,” Proc. 25th Int'l Symp. Fault-Tolerant Computing, pp. 381-391, 1995.
[2] S. Garg, A. van Moorsel, K. Vaidyanathan, and K. Trivedi, “A Methodology for Detection and Estimation of Software Aging,” Proc. Ninth Int'l Symp. Software Reliability Eng., pp. 283-292, 1998.
[3] L. Li, K. Vaidyanathan, and K. Trivedi, “An Approach for Estimation of Software Aging in a Web Server,” Proc. Int'l Symp. Empirical Software Eng., pp. 91-100, 2002.
[4] M. Grottke, L. Li, K. Vaidyanathan, and K. Trivedi, “Analysis of Software Aging in a Web Server,” IEEE Trans. Reliability, vol. 55, no. 3, pp. 411-420, Sept. 2006.
[5] S. Garg, A. Puliafito, M. Telek, and K. Trivedi, “Analysis of Preventive Maintenance in Transactions Based Software Systems,” IEEE Trans. Computers, vol. 47, no. 1, pp. 96-107, Jan. 1998.
[6] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” Proc. 19th ACM Symp. Operating Systems Principles, pp. 164-177, 2003.
[7] S. Garg, Y. Huang, C. Kintala, and K. Trivedi, “Time and Load Based Software Rejuvenation: Policy, Evaluation and Optimality,” Proc. Fault Tolerance Symp., pp. 22-25, 1995.
[8] K. Kourai and S. Chiba, “A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines,” Proc. 37th Ann. IEEE/IFIP Int'l Conf. Dependable Systems and Networks, pp. 245-254, 2007.
[9] A. Williamson, “Xen changeset 9392,” Xen Mercurial repositories, 2006.
[10] K. Fraser, “Xen changeset 11752,” Xen Mercurial repositories, 2006.
[11] VMware Inc., “VMware,” http:/
[12] K. Vaidyanathan and K. Trivedi, “A Measurement-Based Model for Estimation of Software Aging in Operational Software Systems,” Proc. 10th Int'l. Symp. Software Reliability Eng., pp. 84-93, 1999.
[13] K. Vaidyanathan and K. Trivedi, “A Comprehensive Model for Software Rejuvenation,” IEEE Trans. Dependable and Secure Computing, vol. 2, no. 2, pp. 124-137, Apr.-June 2003.
[14] V. Hanquez, “Xen changeset 8640,” Xen Mercurial repositories, 2006.
[15] Intel Corporation, “Intel Virtualization Technology Specification for the IA-32 Intel Architecture,” 2005.
[16] AMD, “AMD64 Virtualization Codenamed “Pacifica” Technology: Secure Virtual Machine Architecture Reference Manual,” 2005.
[17] JBoss Group, “JBoss Application Server,” http:/
[18] Hewlett-Packard, Intel, Microsoft, Phoenix Technologies, and Toshiba, “Advanced Configuration and Power Interface Specification, Revision 3.0b,” http:/, 2006.
[19] A. Pfiffer, “Reducing System Reboot Time with kexec,” http:/, 2003.
[20] Apache Software Foundation, “Apache HTTP Server Project,” http:/
[21] D. Mosberger and T. Jin, “httperf: A Tool for Measuring Web Server Performance,” ACM SIGMETRICS Performance Evaluation Rev., vol. 26, no. 3, pp. 31-37, 1998.
[22] V. Castelli, R. Harper, P. Heidelberger, S. Hunter, K. Trivedi, K. Vaidyanathan, and W. Zeggert, “Proactive Management of Software Aging,” IBM J. Research and Development, vol. 45, no. 2, pp. 311-332, Mar. 2001.
[23] K. Vaidyanathan, R. Harper, S. Hunter, and K. Trivedi, “Analysis and Implementation of Software Rejuvenation in Cluster Systems,” Proc. ACM SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems, pp. 62-71, 2001.
[24] C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield, “Live Migration of Virtual Machines,” Proc. Second Conf. Symp. Networked Systems Design and Implementation, pp. 1-11, 2005.
[25] G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox, “Microreboot - A Technique for Cheap Recovery,” Proc. Sixth Conf. Symp. Operating Systems Design and Implementation, pp. 31-44, 2004.
[26] M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young, “Mach: A New Kernel Foundation for UNIX Development,” Proc. USENIX Summer Conf., pp. 93-112, 1986.
[27] M. Swift, B. Bershad, and H. Levy, “Improving the Reliability of Commodity Operating Systems,” Proc. 19th ACM Symp. Operating Systems Principles, pp. 207-222, 2003.
[28] B. Randell, “System Structure for Software Fault Tolerance,” IEEE Trans. Software Eng., vol. SE-1, no. 2, pp. 220-232, June 1975.
[29] S. Feldman and C. Brown, “IGOR: A System for Program Debugging via Reversible Execution,” Proc. Workshop Parallel and Distributed Debugging, pp. 112-123, 1989.
[30] J. Plank, J. Xu, and R. Netzer, “Compressed Differences: An Algorithm for Fast Incremental Checkpointing,” Technical Report CS-95-302, Univ. of Tennessee, 1995.
[31] GIGABYTE Technology, “i-RAM,” http:/www.gigabyte.
[32] M. Baker and M. Sullivan, “The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment,” Proc. USENIX Summer Conf., pp. 31-44, 1992.
[33] P. Chen, W. Ng, S. Chandra, C. Aycock, G. Rajamani, and D. Lowell, “The Rio File Cache: Surviving Operating System Crashes,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 74-83, 1996.
[34] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, “An Empirical Study of Operating Systems Errors,” Proc. 18th ACM Symp. Operating Systems Principles, pp. 73-88, 2001.
[35] A. Ganapathi, V. Ganapathi, and D. Patterson, “Windows XP Kernel Crash Analysis,” Proc. Large Installation System Administration Conf., pp. 149-159, 2006.

Index Terms:
Operating systems, checkpoint/restart, main memory, availability, performance.
Kenichi Kourai, Shigeru Chiba, "Fast Software Rejuvenation of Virtual Machine Monitors," IEEE Transactions on Dependable and Secure Computing, vol. 8, no. 6, pp. 839-851, Nov.-Dec. 2011, doi:10.1109/TDSC.2010.20
Usage of this product signifies your acceptance of the Terms of Use.