The Community for Technology Leaders
2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) (2015)
Melbourne, Australia
Dec. 14, 2015 to Dec. 17, 2015
ISSN: 1521-9097
ISBN: 978-1-4673-8670-8
pp: 692-699
Kevin A. Brown , Dept. of Math. & Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
Jens Domke , Fac. of Comput. Sci., Tech. Univ. Dresden, Dresden, Germany
Satoshi Matsuoka , GSIC, Tokyo Inst. of Technol., Tokyo, Japan
ABSTRACT
As the scale of high-performance computing systems increases, optimizing inter-process communication becomes more challenging while being critical for ensuring good performance. However, the hardware layer abstraction provided by MPI makes it difficult to study application communication performance over the network hardware, especially for collective operations. We present a new approach to network performance analysis based on exposing low-level communication metrics in a flexible manner and conducting hardware-centric analysis of these metrics. We show how low-level network metrics can be revealed using Open MPI's Peruse utility, without interfacing with the hardware layer. A lightweight profiler, ibprof, was developed to aggregate these metrics from message passing events at a cost of <;1% runtime overhead for communication in NPB kernel and application benchmarks. We also developed a flexible visualization module for the Boxfish analysis tool to analyze our communication profile over the physical topology of the network. Using case studies, we demonstrate how our approach can identify communication anomalies in network applications and guide performance optimization strategies.
INDEX TERMS
Libraries, Data visualization, Hardware, Measurement, Network topology, Ports (Computers), Topology
CITATION

K. A. Brown, J. Domke and S. Matsuoka, "Hardware-Centric Analysis of Network Performance for MPI Applications," 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), Melbourne, Australia, 2016, pp. 692-699.
doi:10.1109/ICPADS.2015.92
278 ms
(Ver 3.3 (11022016))