Nov. 11, 2006 to Nov. 17, 2006
I-Hsin Chung , IBM Thomas J. Watson Research Center
Robert E. Walkup , IBM Thomas J. Watson Research Center
Hui-Fang Wen , IBM Thomas J. Watson Research Center
Hao Yu , IBM Thomas J. Watson Research Center
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SC.2006.43
Applications on today's massively parallel supercomputers are often guided with performance analysis tools toward scalable performance on thousands of processors. However, conventional tools for parallel performance analysis have serious problems due to the large data volume that needs to be handled. In this paper, we discuss the scalability issue for MPI performance analysis on Blue Gene/L, the world's fastest supercomputing platform. First we present an experimental study of existing MPI performance tools that were ported to BG/L from other platforms. These tools can be classified into two categories: profiling tools that collect timing summaries, and tracing tools that collect a sequence of time-stamped events. Profiling tools produce small data volumes and can scale well, but tracing tools tend to scale poorly. We then describe a configurable MPI tracing tool developed for BG/L. By providing a configurable method for trace generation, the volume of trace data can be controlled, and scalability is significantly improved.
I-Hsin Chung, Robert E. Walkup, Hui-Fang Wen, Hao Yu, "MPI Performance Analysis Tools on Blue Gene/L", SC, 2006, SC Conference, SC Conference 2006, pp. 16, doi:10.1109/SC.2006.43