The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 177-187
Kenzo Van Craeynest , Ghent Univ., Ghent, Belgium
Shoaib Akram , Ghent Univ., Ghent, Belgium
Wim Heirman , Ghent Univ., Ghent, Belgium
Aamer Jaleel , VSSAD, Intel Corp., Hillsboro, OR, USA
Lieven Eeckhout , Ghent Univ., Ghent, Belgium
Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy- and power-efficiency by scheduling workloads on the most appropriate core type. A significant body of recent work has focused on improving system throughput through scheduling. However, none of the prior work has looked into fairness. Yet, guaranteeing that all threads make equal progress on heterogeneous multi-cores is of utmost importance for both multi-threaded and multi-program workloads to improve performance and quality-of-service. Furthermore, modern operating systems affinitize workloads to cores (pinned scheduling) which dramatically affects fairness on heterogeneous multi-cores. In this paper, we propose fairness-aware scheduling for single-ISA heterogeneous multi-cores, and explore two flavors for doing so. Equal-time scheduling runs each thread or workload on each core type for an equal fraction of the time, whereas equal-progress scheduling strives at getting equal amounts of work done on each core type. Our experimental results demonstrate an average 14% (and up to 25%) performance improvement over pinned scheduling through fairness-aware scheduling for homogeneous multi-threaded workloads; equal-progress scheduling improves performance by 32% on average for heterogeneous multi-threaded workloads. Further, we report dramatic improvements in fairness over prior scheduling proposals for multi-program workloads, while achieving system throughput comparable to throughput-optimized scheduling, and an average 21% improvement in throughput over pinned scheduling.
Multicore processing, Instruction sets, Processor scheduling, Throughput, Hardware, Scheduling, Computational modeling,regression tree, GPU, auto-tuning, decision tree, design space exploration
Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, Lieven Eeckhout, "Starchart: hardware and software optimization using recursive partitioning regression trees", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 177-187, 2013, doi:10.1109/PACT.2013.6618815
312 ms
(Ver 3.3 (11022016))