The Community for Technology Leaders
2013 42nd International Conference on Parallel Processing (2009)
Vienna, Austria
Sept. 22, 2009 to Sept. 25, 2009
ISSN: 0190-3918
ISBN: 978-0-7695-3802-0
pp: 333-339
High-end computing (HEC) systems have passed the petaflop barrier and continue to move toward the next frontier of {exascale} computing. As companies and research institutes continue to work toward architecting these enormous systems, it is becoming increasingly clear that these systems will utilize a significant amount of shared hardware between processing units, including shared caches, memory management engines, and network infrastructure. While these systems are optimized to use all of the hardware available in a dedicated manner to achieve the best performance, in practice, the shared nature of this hardware makes scheduling applications on it difficult and wasteful. For example, while the IBM Blue Gene/P system has been designed to use a torus network for efficient communication, some of the torus links (especially those connecting different racks) are shared between multiple racks. Thus, a job running on one rack, might preclude another job from running on a second rack in spite of having its compute resources completely idle. In this paper, we assess the relative performance degradation noticed by real applications when such shared network hardware is completely unutilized for some cases. Our measurements on Intrepid, one of the largest Blue Gene/P installations in the world, demonstrate less than 5% degradation for several leadership applications commonly run on the Intrepid system. Further, we demonstrate that the additional scheduling flexibility offered by not sharing such hardware can improve the overall job turnaround time by nearly 40% in some cases.
Job Scheduling, Networking
Narayan Desai, Darius Buntinas, Pavan Balaji, Anthony Chan, Daniel Buettner, "Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P", 2013 42nd International Conference on Parallel Processing, vol. 00, no. , pp. 333-339, 2009, doi:10.1109/ICPP.2009.33
85 ms
(Ver 3.3 (11022016))