Cluster Computing and the Grid, IEEE International Symposium on (2011)
Newport Beach, California USA
May 23, 2011 to May 26, 2011
In this paper we characterize the behavior with respect to memory locality management of scientific computing applications running in virtualized environments. NUMA locality on current solutions (KVM and Xen) is enforced by pinning virtual machines to CPUs and providing NUMA aware allocation in hyper visors. Our analysis shows that due to two-level memory management and lack of integration with page reclamation mechanisms, applications running on warm VMs suffer from a ``leakage'' of page locality. Our results using MPI, UPC and Open MP implementations of the NAS Parallel Benchmarks, running on Intel and AMD NUMA systems, indicate that applications observe an overall average performance degradation of 55\% when compared to native. Runs on ``cold'' VMs suffer an average performance degradation of 27\%, while subsequent runs are roughly 30\% slower than the cold runs. We quantify the impact of locality improvement techniques designed for full virtualization environments: hyper visor level page remapping and partitioning the NUMA domains between multiple virtual machines. Our analysis shows that hyper visor only schemes have little or no potential for performance improvement. When the programming model allows it, system partitioning with proper VM and runtime support is able to re-produce native performance: in a partitioned system with one virtual machine per socket the average workload performance is 5\% better than native.
virtualization, NUMA support, scientific applications, KVM
Khaled Z. Ibrahim, Steven Hofmeyr, Costin Iancu, "Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines", Cluster Computing and the Grid, IEEE International Symposium on, vol. 00, no. , pp. 1-12, 2011, doi:10.1109/CCGrid.2011.50