Issue No.04 - Fourth Quarter (2012 vol.5)
pp: 484-496
Ardalan Kangarlou , Purdue University, West Lafayette
Patrick Eugster , Purdue University, West Lafayette
Dongyan Xu , Purdue University, West Lafayette
A virtual networked infrastructure (VNI) consists of virtual machines (VMs) connected by a virtual network. Created for individual users on a shared cloud infrastructure, VNIs reflect the concept of "Infrastructure as a Service” (IaaS) as part of the emerging cloud computing paradigm. The ability to take snapshots of an entire VNI—including images of the VMs with their execution, communication, and storage states—yields a unique approach to reliability as a VNI snapshot can be used to restore the operation of the entire virtual infrastructure. We present VNsnap, a system that takes distributed snapshots of VNIs. Unlike many existing distributed snapshot/checkpointing solutions, VNsnap does not require any modifications to the applications, libraries, or (guest) operating systems (OSs) running in the VMs. Furthermore, by performing much of the snapshot operation concurrently with the VNI's normal operation, VNsnap incurs only seconds of downtime. We have implemented VNsnap on top of Xen. Our experiments with real-world parallel and distributed applications demonstrate VNsnap's effectiveness and efficiency.
Switches, IP networks, Checkpointing, Protocols, Image segmentation, Image restoration, Computer architecture, Virtual environments, reliability, Virtual infrastructure, infrastructure-as-a-service (IaaS), cloud computing, distributed snapshots
Ardalan Kangarlou, Patrick Eugster, Dongyan Xu, "VNsnap: Taking Snapshots of Virtual Networked Infrastructures in the Cloud", IEEE Transactions on Services Computing, vol.5, no. 4, pp. 484-496, Fourth Quarter 2012, doi:10.1109/TSC.2011.29
[1] M. Armbrust et al., "Above the Clouds: A Berkeley View of Cloud Computing," Technical Report No. UCB/EECS-2009-28, Univ. of California, Berkeley, 2009.
[2] A. Kangarlou, D. Xu, P. Ruth, and P. Eugster, "Taking Snapshots of Virtual Networked Environments," Proc. Second Int'l Workshop Virtualization Technology in Distributed Computing, 2007.
[3] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, "Xen and the Art of Virtualization," Proc. 19th ACM Symp. Operating Systems Principles (SOSP), 2003.
[4] X. Jiang and D. Xu, "VIOLIN: Virtual Internetworking on Overlay INfrastructure," Technical Report CSD TR 03-027, Purdue Univ., 2003.
[5] X. Jiang, D. Xu, H.J. Wang, and E.H. Spafford, "Virtual Playgrounds for Worm Behavior Investigation," Proc. Eighth Int'l Symp. Recent Advances in Intrusion Detection (RAID), 2005.
[6] C. Clark, K. Fraser, S. Hand, and J.G. Hansen, "Live Migration of Virtual Machines," Proc. Second USENIX Symp. Networked Systems Design and Implementation (NSDI), 2005.
[7] C.A. Waldspurger, "Memory Resource Management in VMware ESX Server," Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI), 2002.
[8] D. Gupta, S. Lee, M. Vrable, S. Savage, A.C. Snoeren, G. Varghese, G.M. Voelker, and A. Vahdat, "Difference Engine: Harnessing Memory Redundancy in Virtual Machines," Proc. Eighth USENIX Symp. Operating System Design and Implementation (OSDI), 2008.
[9] , 2012.
[10] F. Mattern, "Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation," J. Parallel and Distributed Computing, vol. 18, pp. 423-434, 1993.
[11] S. Sankaran, J.M. Squyres, B. Barrett, and A. Lumsdaine, "The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing," Proc. LACSI Symp., 2003.
[12] G.E. Fagg and J.J. Dongarra, "Lecture Notes in Computer Science 1 FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World," 2000.
[13] A. Clematis and V. Ginuzzi, "CPVM - Extending PVM for Consistent Checkpointing," Proc. IEEE Fourth Euromicro Workshop Parallel and Distributed Processing (PDP), 1996.
[14] O. Laadan, D. Phung, and J. Nieh, "Transparent Checkpoint-Restart of Distributed Applications on Commodity Clusters," Proc. IEEE Int'l Conf. Cluster Computing, 2005.
[15] D.P. Scarpazza, P. Mullaney, O. Villa, F. Petrini, V. Tipparaju, and J. Nieplocha, "Transparent System-Level Migration of PGAS Applications Using Xen on Infiniband," Proc. IEEE Int'l Conf. Cluster Computing, 2007.
[16] J.F. Ruscio, M.A. Heffner, and S. Varadarajan, "DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems," Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), 2007.
[17] R.W. Stevens, "Reading," TCP/IP Illustrated, vol. 1, Addison-Wesley, 1996.
[18] http://sources.redhat.comlvm2, 2012.
[19] A. Burtsev, P. Radhakrishnan, M. Hibler, and J. Lepreau, "Transparent Checkpoints of Closed Distributed Systems in Emulab," Proc. Fourth ACM European Conf. Computer Systems, 2009.
[20] , 2012.
[21] http:/, 2012.
[22] Y. Chen, J.S. Plank, and K. Li, "CLIP: A Checkpointing Tool for Message-Passing Parallel Programs," Proc. ACM/IEEE Conf. Supercomputing (SC), 1997.
[23] S. Osman, D. Subhraveti, G. Su, and J. Nieh, "The Design and Implementation of Zap: A System for Migrating Computing Environments," Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI), 2002.
[24] A.B. Nagarajan, F. Mueller, C. Engelmann, and S.L. Scott, "Proactive Fault Tolerance for HPC with Xen Virtualization," Proc. ACM Int'l Conf. Supercomputing (ICS), 2007.
[25] B. Cully, G. Lefebvre, D. Meyer, M. Freeley, N. Hutchinson, and A. Warfield, "Remus: High Availability via Asynchronous Virtual Machine Replication," Proc. Fifth USENIX Symp. Networked Systems Design and Implementation (NSDI), 2008.
[26] Y. Tamura, "Kemari: Virtual Machine Synchronization for Fault Tolerance Using DomT," Xen Summit, 2008.
[27] M. Lu and T. cker Chiueh, "Fast Memory State Synchronization for Virtualization-Based Fault Tolerance," Proc. IEEE/IFIP Int'l Conf. Dependable Systems and Networks (DSN-DCCS), 2009.
[28] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar, "An Integrated Experimental Environment for Distributed Systems and Networks," Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI), 2002.
[29] M.R. Hines and K. Gopalan, "Post-Copy Based Live Virtual Machine Migration Using Adaptive Pre-Paging and Dynamic Self-Ballooning," Proc. ACM SIGPLAN/SIGOPS Int'l Conf. Virtual Execution Environments (VEE), 2009.
[30] H.A. Lagar-Cavilla, J.A. Whitney, A. Scannell, P. Patchin, S.M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan, "SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing," Proc. Fourth ACM European Conf. Computer Systems (Eurosys), 2009.
[31] A. Kangarlou, P. Eugster, and D. Xu, "VNsnap: Taking Snapshots of Virtual Networked Environments with Minimal Downtime," Proc. IEEE/IFIP Int'l Conf. Dependable Systems and Networks (DSN-DCCS), 2009.