Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06) A Failure-Aware Scheduling Strategy in Large-Scale Cluster System Singapore May 16-May 19 ISBN: 0-7695-2585-7
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CCGRID.2006.4
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling takes charge with high efficient resource management and reasonable job scheduling. The function of job scheduling in cluster is divided into two sub-parts: job selection and node allocation. In this paper, we introduce a failure-aware scheduling strategy named LUNF (Longest Uptime Node First) node allocation policy using characterization of nodes' failure. Simulation results show that LUNF policy do better than random node allocation policy for the system performance.
Citation:
Wu Linping, Meng Dan, Jianfeng Zhan, Wang Lei, Tu Bibo, "A Failure-Aware Scheduling Strategy in Large-Scale Cluster System," ccgrid, pp.645-648, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), 2006 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||