The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2012 vol.61)
pp: 804-816
Hao Chen , Hunan University, Chang Sha
Jianhua Sun , Hunan University, Chang Sha
Lin Shi , Hunan University, Chang Sha
This paper describes vCUDA, a general-purpose graphics processing unit (GPGPU) computing solution for virtual machines (VMs). vCUDA allows applications executing within VMs to leverage hardware acceleration, which can be beneficial to the performance of a class of high-performance computing (HPC) applications. The key insights in our design include API call interception and redirection and a dedicated RPC system for VMs. With API interception and redirection, Compute Unified Device Architecture (CUDA) applications in VMs can access a graphics hardware device and achieve high computing performance in a transparent way. In the current study, vCUDA achieved a near-native performance with the dedicated RPC system. We carried out a detailed analysis of the performance of our framework. Using a number of unmodified official examples from CUDA SDK and third-party applications in the evaluation, we observed that CUDA applications running with vCUDA exhibited a very low performance penalty in comparison with the native environment, thereby demonstrating the viability of vCUDA architecture.
CUDA, virtual machine, GPGPU, RPC, virtualization.
Hao Chen, Jianhua Sun, Lin Shi, "vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines", IEEE Transactions on Computers, vol.61, no. 6, pp. 804-816, June 2012, doi:10.1109/TC.2011.112
[1] S. Al-Kiswany, A. Gharaibeh, E. Santos-Neto, G. Yuan, and M. Ripeanu, “StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems,” Proc. Int'l Symp. High Performance Distributed Computing (HPDC '08), June 2008.
[2] A. Burtsev, K. Srinivasan, P. Radhakrishnan, L.N. Bairavasundaram, K. Voruganti, and G.R. Goodson, “Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances,” Proc. Conf. USENIX Ann. Technical Conf. (USENIX '09), June 2009.
[3] B. Bershad, T. Anderson, E. Lazowska, and H. Levy, “User-Level Interprocess Communication for Shared Memory Multiprocessors,” ACM Trans. Computer Systems, vol. 9, no. 2, pp. 175-198, May 1991.
[4] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), pp. 164-177, Oct. 2003.
[5] A. Chien et al. “Design and Evaluation of an HPVM-Based Windows NT Supercomputer,” The Int'l J. High Performance Computing Applications, vol. 13, no. 3, pp. 201-219, 1999.
[6] H. Chen, L. Shi, and J. Sun, “VMRPC: A High Efficiency and Light Weight RPC System for Virtual Machines,” Proc. 18th IEEE Int'l Workshop Quality of Service (IWQoS '10), 2010.
[7] CUDA: Compute Unified Device Architecture. http://www. , 2010.
[8] M. Dowty and J. Sugerman, “GPU Virtualization on VMware's Hosted I/O Architecture,” SIGOPS Operating Systems Rev., vol. 43, pp. 73-82, July 2009.
[9] J. Duato, A. Pena, F. Silla, R. Mayo, and E.S. Quintana, “rCUDA: Reducing the Number of GPU-Based Accelerators in High Performance Clusters,” Proc. Int'l Conf. High Performance omputing and Simulation (HPCS '10), pp. 224-231, July 2010.
[10] G.W. Dunlap, S.T. King, S. Cinar, M.A. Basrai, and P.M. Chen, “Revirt: Enabling Intrusion Analysis through Virtual Machine Logging and Replay,” Proc. Fifth Symp. Operating Systems design and Implementation (OSDI '02), Dec. 2002.
[11] N. Fujimoto, “Faster Matrix-Vector Multiplication on GeForce 8800GTX,” Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '08), Apr. 2008.
[12] G. Giunta, R. Montella, G. Agrillo, and G. Coviello, “A GPGPU Transparent Virtualization Component for High Performance Computing Clouds,” Proc. Int'l Euro-Par Conf. Parallel Processing, pp. 379-391, 2010.
[13] “General Purpose Programming on GPUs: What programming APIs exist for GPGPU,” GPGPU 2011.
[14] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan, “Gvim: Gpu-Accelerated Virtual Machines,” Proc. ACM Workshop System-Level Virtualization for High Performance Computing (HPCVirt '09), pp. 17-24, 2009.
[15] J.G. Hansen, “Blink: 3d Display Multiplexing for Virtualized Applications,” technical report, DIKU - Univ. of Copenhagen, , Jan. 2006.
[16] W. Huang, J. Liu, B. Abali, and D.K. Panda, “A Case for High Performance Computing with Virtual Machines,” Proc. 20th Ann. Int'l Conf. Supercomputing, June 2006.
[17] G. Humphreys, M. Houston, R. Ng, R. Frank, S. Ahern, P.D. Kirchner, and J.T. Klosowski, “Chromium: A Streamprocessing Framework for Interactive Rendering on Clusters,” Proc. 29th Ann. Conf. Computer Graphics and Interactive Techniques, pp. 693-702, 2002.
[18] G. Humphreys, M. Eldridge, I. Buck, G. Stoll, M. Everett, and P. Hanrahan, “WireGL: A Scalable Graphics System for Clusters,” Proc. ACM SIGGRAPH, pp. 129-140, Aug. 2001.
[19] IBM's ZAPdb OpenGL Debugger, Computer Software, 1998.
[20] Intel Graphics Performance Toolkit. Computer Software.
[21] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “KVM: The Linux Virtual Machine Monitor,” Proc. Linux Symp., pp. 225-230, 2007.
[22] K. Kim, C. Kim, S.I. Jung, H.S. Shin, and J.S. Kim, “Inter-Domain Socket Communications Supporting High Performance and Full Binary Compatibility on Xen,” Proc. Int'l Conf. Virtual Execution Environments (VEE '08), pp. 11-20, Mar. 2008.
[23] H.A. Lagar-Cavilla, N. Tolia, M. Satyanarayanan, and E. de La-ra, “VMM-Independent Graphics Acceleration,” Proc. Int'l Conf. Virtual Execution Environments (VEE '07), June 2007.
[24] C. Lessig, “An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor,” technical report, Univ. of Toronto, 2008.
[25] J. LeVasseur, V. Uhlig, J. Stoess, and S. Gotz, “Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines,” Proc. Sixth Symp. Operating Systems Design and Implementation (OSDI '04), Dec. 2004.
[26] MDGPU, , 2011.
[27] IVSHMEM, 0.11.Nahanni-CamMacdonell.pdf, 2011.
[28] A. Menon et al. “Diagnosing Performance Overheads in the Xen Virtual Machine Environment,” Proc. First ACM/USENIX Int'l Conf. Virtual Execution Environments (VEE '05), pp. 13-23, June 2005.
[29] A. Mohr and M. Gleicher, “HijackGL: Reconstructing from Streams for Stylized Rendering,” Proc. Second Int'l Symp. Non-Photorealistic Animation and Rendering, 2002.
[30] MP3 LAME Encoder (Nvidia's CUDA Contest), http:/, 2010.
[31] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A.E. Lefohn, and T.J. Purcell, “A Survey of General-Purpose Computation on Graphics Hardware,” J. Computer Graphics Forum, vol. 26, pp. 21-51, 2007.
[32] L. Shi, H. Chen, and J. Sun, “vCUDA: GPU Accelerated High Performance Computing in Virtual Machines,” Proc. Int'l Symp. Parallel and Distributed Processing (IPDPS '09), pp. 1-11, May 2009.
[33] D. Tarditi, S. Puri, and J. Oglesby, “Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses,” Proc. 12th Int'l Conf. Architectural Support for Programming guages and Operating Systems (ASPLOS), 2006.
[34] VirtualGL, http:/, 2011.
[35] VMCHANNEL, Requirements , 2011.
[36] VMware Workstation,, 2011.
[37] J. Wang, K. Wright, and K. Gopalan, “XenLoop: A Transparent High Performance Inter-VM Network Loopback,” Proc. 17th Int'l Symp. High Performance Distributed Computing (HPDC '08), pp. 109-118, June 2008.
[38] P. Willmann, J. Shafer, D. Carr, A. Menon, and S. Rixner, “Concurrent Direct Network Access for Virtual Machine Monitors,” Proc. IEEE 13th Int'l Symp. High Performance Computer Architecture (HPCA '07), pp. 306-317, 2007.
[39] Xen VGA Passthrough, , 2011.
[40] XMLRPC, http:/, 2011.
[41] X. Zhang, S. McIntosh, P. Rohatgi, and J.L. Griffin, “Xensocket: A High-Throughput Interdomain Transport for Virtual Machines,” Proc. Eighth ACM/IFIP/USENIX Int'l Conf. Middleware, pp. 184-203, Nov. 2007.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool