2012 IEEE International Conference on Cluster Computing (2012)
Beijing, China China
Sept. 24, 2012 to Sept. 28, 2012
The rapid growth of supercomputing systems, both in scale and complexity, has been accompanied by degradation in system efficiencies. The sheer abundance of resources including millions of cores, vast amounts of physical memory and high-bandwidth networks are heavily under-utilized. This happens when the resources are time-shared amongst parallel applications that are scheduled to run on a subset of compute nodes in an exclusive manner. Several space-sharing techniques that have been proposed in the literature allow parallel applications to be co-located on compute nodes and share resources with each other. Although this leads to better system efficiencies, it also causes contention for system resources. In this work, we specifically address the problem of network contention, caused due to the sharing of network resources by parallel applications and file systems simultaneously. We leverage the Quality-of-Service (QoS) capabilities of the widely used Infini Band interconnect to enhance our data-staging file system, making it QoS-aware. This is a user-level framework that is agnostic of the file system and MPI implementation. Using this file system, we demonstrate the isolation of file system traffic from MPI communication traffic, thereby reducing the network contention. Experimental results show that MPI point-to-point latency can be reduced by up to 320 microseconds, and the bandwidth improved by up to 674MB/s in the presence of contention with I/O traffic. Furthermore, we were able to reduce the runtime of the AWP-ODC MPI application by about 9.89% in the presence of network contention, and also reduce the time spent in communication by the NAS CG kernel by 23.46%.
Quality of service, Servers, Bandwidth, Fabrics, Noise, Libraries, Kernel, Network Contention and Filesystems, Quality-of-Service, InfiniBand, Data-Staging, Space-Sharing
R. Rajachandrasekar, J. Jaswani, H. Subramoni and D. K. Panda, "Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework," 2012 IEEE International Conference on Cluster Computing(CLUSTER), Beijing, China China, 2012, pp. 329-336.