The Community for Technology Leaders
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Haifa, Israel
Sept. 11, 2016 to Sept. 15, 2016
ISBN: 978-1-5090-5308-7
pp: 275-286
Paul Caheny , Barcelona Supercomputing Center, Spain
Marc Casas , Barcelona Supercomputing Center, Spain
Miquel Moreto , Barcelona Supercomputing Center, Spain
Herve Gloaguen , Bull Atos Technologies, Les Clayes-sous-Bois, France
Maxime Saintes , Bull Atos Technologies, Les Clayes-sous-Bois, France
Eduard Ayguade , Barcelona Supercomputing Center, Spain
Jesus Labarta , Barcelona Supercomputing Center, Spain
Mateo Valero , Barcelona Supercomputing Center, Spain
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on- and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23× to 2.54× and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.
Coherence, Switched-mode power supply, Memory management, Runtime, Resource management, Protocols

P. Caheny et al., "Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling," 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), Haifa, Israel, 2016, pp. 275-286.
85 ms
(Ver 3.3 (11022016))