The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2009)
Rome, Italy
May 23, 2009 to May 29, 2009
ISBN: 978-1-4244-3751-1
pp: 1-8
Hikmet Dursun , Performance and Architecture Laboratory (PAL), Computer Science for HPC (CCS-1), Los Alamos National Laboratory, NM 87545, USA
Kevin J. Barker , Performance and Architecture Laboratory (PAL), Computer Science for HPC (CCS-1), Los Alamos National Laboratory, NM 87545, USA
Darren J. Kerbyson , Performance and Architecture Laboratory (PAL), Computer Science for HPC (CCS-1), Los Alamos National Laboratory, NM 87545, USA
Scott Pakin , Performance and Architecture Laboratory (PAL), Computer Science for HPC (CCS-1), Los Alamos National Laboratory, NM 87545, USA
ABSTRACT
In this paper, we present a methodology for profiling parallel applications executing on the IBM PowerXCell 8i (commonly referred to as the “Cell” processor). Specifically, we examine Cell-centric MPI programs on hybrid clusters containing multiple Opteron and Cell processors per node such as those used in the petascale Roadrunner system. Our implementation incurs less than 3.2 µs of overhead per profile call while efficiently utilizing the limited local store of the Cell's SPE cores. We demonstrate the use of our profiler on a cluster of hybrid nodes running a suite of scientific applications. Our analyses of inter-SPE communication (across the entire cluster) and function call patterns provide valuable information that can be used to optimize application performance.
INDEX TERMS
CITATION

S. Pakin, D. J. Kerbyson, K. J. Barker and H. Dursun, "Application profiling on Cell-based clusters," 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Rome, 2009, pp. 1-8.
doi:10.1109/IPDPS.2009.5161092
90 ms
(Ver 3.3 (11022016))