The Community for Technology Leaders
2013 IEEE International Symposium on Workload Characterization (IISWC) (2012)
La Jolla, CA, USA USA
Nov. 4, 2012 to Nov. 6, 2012
ISBN: 978-1-4673-4531-6
pp: 164-173
ChunYi Su , Department of Computer Science, Virginia Tech, 24060, USA
Dong Li , Oak Ridge National Laboratory, TN 37831, USA
Dimitrios S. Nikolopoulos , Queen's University of Belfast, Northern Ireland, UK
Kirk W. Cameron , Department of Computer Science, Virginia Tech, 24060, USA
Bronis R. de Supinski , Lawrence Livermore National Laboratory, CA 94550, USA
Edgar A. Leon , Lawrence Livermore National Laboratory, CA 94550, USA
ABSTRACT
Non-Uniform Memory Access (NUMA) architectures are ubiquitous in HPC systems. NUMA along with other factors including socket layout, data placement, and memory contention significantly increase the search space to find an optimal mapping of applications to NUMA systems. This search space may be intractable for online optimization and challenging for efficient offline search. This paper presents DyNUMA, a framework for dynamic optimization of programs on NUMA architectures. DyNUMA uses simple, memory-centric, performance and energy models with non-linear terms to capture the complex and interacting effects of system layout, program concurrency, data placement, and memory controller contention. DyNUMA leverages an artificial neural network (ANN) with input, output, and intermediate layers that emulate program threads, memory controllers, processor cores, and their interactions. Using an ANN in conjunction with critical path analysis, DyNUMA autonomously optimizes programs for performance or energy-efficiency metrics. We used DyNUMA on a variety of benchmarks from the NPB and ASC Sequoia suites on three different architectures (a 16-core AMD Barcelona system, a 32-core AMD Magny-Cours system, and a 64-core Tilera TilePro64 system). Our results show that DyNUMA achieves on average 8.7% improvement in performance (12.9% in the best case), 16% improvement in Energy-Delay (30.6% in the best case) and 9.1% improvement in MFLOPS/Watt (10.7% in the best case) compared to the default Linux scheduling.
INDEX TERMS
CITATION
ChunYi Su, Dong Li, Dimitrios S. Nikolopoulos, Kirk W. Cameron, Bronis R. de Supinski, Edgar A. Leon, "Model-based, memory-centric performance and power optimization on NUMA multiprocessors", 2013 IEEE International Symposium on Workload Characterization (IISWC), vol. 00, no. , pp. 164-173, 2012, doi:10.1109/IISWC.2012.6402921
96 ms
(Ver 3.3 (11022016))