2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW) (2014)
Phoenix, AZ, USA
May 19, 2014 to May 23, 2014
Fairshare is commonly one of the factors used by cluster resource management systems to prioritize jobs during scheduling. Despite the grid vision of a transparent and unified infrastructure, fairshare is normally calculated and enforced at the local cluster level rather than at a grid-wide scale. Aequus is a self-contained decentralized system for grid-wide fairshare job prioritization. Using Aequus, detailed global share policies can be combined with local cluster policies to offer a unified grid fairshare prioritization system where local administrations retain control over their clusters. This work shows how Aequus can be integrated with local resource management systems such as SLURM and Maui with minimal intrusion. Early results from production help assess the maturity of the system, and the system is further tested and evaluated for use at a nation-wide scale using workload modeling techniques. Statistical models are created based on historical national grid usage data, and synthetic traces based on these models are used to create a diverse input set used to exemplify system behavior. The system is shown to behave consistently despite great variations in job arrival patterns and partial participation of some of the collaborating installations.
Vectors, Resource management, Data models, Load modeling, Scheduling, Processor scheduling
D. Espling, P. Ostberg and E. Elmroth, "Integration and Evaluation of Decentralized Fairshare Prioritization (Aequus)," 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), Phoenix, AZ, USA, 2014, pp. 1198-1207.