loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04)
With great reliability comes great responsibility: tradeoffs of run-time policy on high reliability systems
Chicago, IL, USA
April 19-April 22
ISBN: 0-7803-8430-X
S.D. Kleban, Sandia Nat. Labs., Albuquerque, NM, USA
J.R. Johnston, Sandia Nat. Labs., Albuquerque, NM, USA
J.A. Ang, Sandia Nat. Labs., Albuquerque, NM, USA
S.H. Clearwater, Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
In this paper we describe a simulation study to improve performance on a large highly utilized cluster at Sandia National Laboratories. The unique characteristic about the cluster is that there are very few constraints on job size. In particular, the run-time is limited only by system times which occur about every two weeks. The major contribution of this paper is that we quantify the difference in makespan between running a single long job and its equivalent in many shorter jobs. We find that running longer jobs is beneficial to the facility as a whole when the cycle-weighted makespans are considered and that running shorter jobs has an overall beneficial effect on the makespan for the jobs taken unweighted and for most users.
Citation:
S.D. Kleban, J.R. Johnston, J.A. Ang, S.H. Clearwater, "With great reliability comes great responsibility: tradeoffs of run-time policy on high reliability systems," ccgrid, pp.547-554, Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.