Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04) With great reliability comes great responsibility: tradeoffs of run-time policy on high reliability systems Chicago, IL, USA April 19-April 22 ISBN: 0-7803-8430-X
In this paper we describe a simulation study to improve performance on a large highly utilized cluster at Sandia National Laboratories. The unique characteristic about the cluster is that there are very few constraints on job size. In particular, the run-time is limited only by system times which occur about every two weeks. The major contribution of this paper is that we quantify the difference in makespan between running a single long job and its equivalent in many shorter jobs. We find that running longer jobs is beneficial to the facility as a whole when the cycle-weighted makespans are considered and that running shorter jobs has an overall beneficial effect on the makespan for the jobs taken unweighted and for most users.
Citation:
S.D. Kleban, J.R. Johnston, J.A. Ang, S.H. Clearwater, "With great reliability comes great responsibility: tradeoffs of run-time policy on high reliability systems," ccgrid, pp.547-554, Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04), 2004 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||