12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 '03)
Quelling Queue Storms
Seattle, Washington
June 22-June 24
ISBN: 0-7695-1965-2
This paper characterizes "queue storms" in supercomputer systems and discusses methods for quelling them. Queue storms are anomalously large queue lengths dependent upon the job size mix, the queuing system, the machine size, and correlations and dependencies between job submissions. We use synthetic data generated from actual job log data from the ASCI Blue Mountain supercomputer combined with different long-range dependencies. We show the distribution of times from the first storm to occur, which is in a sense the time when the machine becomes obsolete because it represents the time when the machine first fails to provide satisfactory turnaround. To overcome queue storms, more resources are needed even if they appear superfluous most of the time. We present two methods, including a grid-based solution, for reducing these correlations and their resulting effect on the size and frequency of queue storms.
Citation:
Stephen D. Kleban, Scott H. Clearwater, "Quelling Queue Storms," hpdc, pp.162, 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 '03), 2003
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||