2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2017)
May 14, 2017 to May 17, 2017
Extremely heterogeneous software stacks have encouraged the use of system virtualization technology for execution of composite high performance computing (HPC) applications to enable full utilization of extreme-scale HPC systems (ExaScale). Parts of composite applications, called loosely-coupled components, consist of a set of loosely-coupled CPU-intensive jobs. Jobs of loosely-coupled components run on a set of virtual machines (VMs), which in turn are distributed on physical machines. Co-location of VMs on physical machines, is the main source of interferences which cause uncertainty in jobs completion time. Motivated by this challenge, our main goal is to introduce an adaptive job scheduling method for VMs of loosely-coupled components in order to bound the negative impact of interferences. On the other hand, due to the abstraction of virtualization, job schedulers are unaware of the status of underlying physical machines. Introducing a scheme to dynamically reconfigure the job scheduler's parameters to inform scheduler about the true status of the physical machines, is our second goal. This paper presents a combination of ASSIGN-ROUTE online job scheduling and a reconfiguration technique allowing a given loosely-coupled component to balance its resource usage load, and thus improve the scaled execution of its loosely-coupled jobs. We prove that reconfiguration covers the virtualization unawareness in a way that the whole technique balances the load, comparable to the optimal load balancing for online deterministic unrelated parallel machine makespan minimization scheduling. We also show that the results of our experiments, support the theoretical achievements specially in case of scaled execution.
parallel machines, resource allocation, scheduling, virtual machines, virtualisation, workstation clusters
S. M. Khorandi, S. Ghiasvand and M. Sharifi, "Reducing Load Imbalance of Virtual Clusters via Reconfiguration and Adaptive Job Scheduling," 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Madrid, Spain, 2017, pp. 992-999.