loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 '03)
Quelling Queue Storms
Seattle, Washington
June 22-June 24
ISBN: 0-7695-1965-2
Stephen D. Kleban, Sandia National Laboratories
This paper characterizes "queue storms" in supercomputer systems and discusses methods for quelling them. Queue storms are anomalously large queue lengths dependent upon the job size mix, the queuing system, the machine size, and correlations and dependencies between job submissions. We use synthetic data generated from actual job log data from the ASCI Blue Mountain supercomputer combined with different long-range dependencies. We show the distribution of times from the first storm to occur, which is in a sense the time when the machine becomes obsolete because it represents the time when the machine first fails to provide satisfactory turnaround. To overcome queue storms, more resources are needed even if they appear superfluous most of the time. We present two methods, including a grid-based solution, for reducing these correlations and their resulting effect on the size and frequency of queue storms.
Citation:
Stephen D. Kleban, Scott H. Clearwater, "Quelling Queue Storms," hpdc, pp.162, 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 '03), 2003
Usage of this product signifies your acceptance of the Terms of Use.