2008 11th IEEE International Conference on Computational Science and Engineering
Application Performance Tuning for Clusters with ccNUMA Nodes
July 16-July 18
ISBN: 978-0-7695-3193-9
With the increasing trend of putting more cores inside a single chip, more clusters adapt multicore multiprocessor nodes for high-performance computing (HPC). Cache coherent non-uniform memory access architectures (ccNUMA) are becoming an increasingly popular choice for such systems. In this paper, application performance analysis is provided using a 2312 Opteron cores system based on Sun Fire servers. Performance bottlenecks are identified and some potential solutions are proposed. With the proposed performance tunings, up to 30% application performance improvement was observed. In addition, provided experimental analysis can be utilized by HPC application developers in order to better understand clusters with ccNUMA nodes and also as a guideline for the usage of such architectures for scientific computing.
Index Terms:
ccNUMA, application performance, cpu affinity, high-performance computing
Citation:
Abdullah Kayi, Edward Kornkven, Tarek El-Ghazawi, Greg Newby, "Application Performance Tuning for Clusters with ccNUMA Nodes," cse, pp.245-252, 2008 11th IEEE International Conference on Computational Science and Engineering, 2008