The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 277-288
Matthias Diener , Informatics Institute, UFRGS, Porto Alegre, Brazil
Eduardo H. M. Cruz , Informatics Institute, UFRGS, Porto Alegre, Brazil
Philippe O. A. Navaux , Informatics Institute, UFRGS, Porto Alegre, Brazil
Anselm Busse , Communication and Operating Systems Group, TU Berlin, Berlin, Germany
Hans-Ulrich Heis , Communication and Operating Systems Group, TU Berlin Berlin, Germany
ABSTRACT
One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality of memory accesses by reducing accesses to remote caches and memories, and using local ones instead. Two techniques can be used to increase the memory access locality: executing threads and processes that access shared data close to each other in the memory hierarchy (thread affinity), and placing the memory pages they access on the NUMA node they are executing on (data affinity). Most related work in this area focuses on either thread or data affinity, but not both, which limits the improvements. Other mechanisms require expensive operations, such as memory access traces or binary analysis, require changes to hardware or work only on specific parallel APIs. In this paper, we introduce kMAF, a mechanism that automatically manages thread and data affinity on the kernel level. The memory access behavior of the running application is determined during its execution by analyzing its page faults. This information is used by kMAF to migrate threads and memory pages, such that the overall memory access locality is optimized. Extensive evaluation with 27 benchmarks from 4 benchmark suites shows substantial performance improvements, with results close to an oracle mechanism. Execution time was reduced by up to 35.7% (13.8% on average), while energy efficiency was improved by up to 34.6% (9.3% on average).
INDEX TERMS
Instruction sets, Memory management, Benchmark testing, Measurement, Operating systems, Mathematical model, Informatics,Cache hierarchies, Thread affinity, Data affinity, NUMA
CITATION
Matthias Diener, Eduardo H. M. Cruz, Philippe O. A. Navaux, Anselm Busse, Hans-Ulrich Heis, "kMAF: Automatic kernel-level management of thread and data affinity", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 277-288, 2014, doi:10.1145/2628071.2628085
82 ms
(Ver 3.3 (11022016))