This Article 
 Bibliographic References 
 Add to: 
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications
November 2011 (vol. 22 no. 11)
pp. 1862-1870
Jidong Zhai, Tsinghua University, Beijing
Tianwei Sheng, Tsinghua University, Beijing
Jiangzhou He, Tsinghua University, Beijing
Wenguang Chen, Tsinghua University, Beijing
Weimin Zheng, Tsinghua University, Beijing
Communication patterns of parallel applications are important to optimize application performance and design better communication subsystems. Communication patterns can be extracted from communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time consuming and expensive. We propose a novel technique, called Fact, which can perform FAst Communication Traces collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. Our idea is based on an observation that most computation and message contents in parallel applications are not relevant to their spatial and volume communication attributes, and therefore can be removed for the purpose of communication trace collection. We have implemented Fact and evaluated it with NPB programs and Sweep3D. The results show that Fact can reduce resource consumptions by two orders of magnitude in most cases.

[1] S. Chodnekar, V. Srinivasan, A.S. Vaidya, A. Sivasubramaniam, and C.R. Das, "Towards a Communication Characterization Methodology for Parallel Applications," Proc. IEEE Symp. High-Performance Computer Architecture (HPCA), 1997.
[2] J. Kim and D.J. Lilja, "Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs," Proc. Int'l Workshop Network-Based Parallel Computing: Comm., Architecture, and Applications (CANPC), pp. 202-216, 1998.
[3] H. Chen, W.G. Chen, J. Huang, B. Robert, and H. Kuhn, "MPIPP: An Automatic Profile-Guided Parallel Process Placement Toolset for SMP Clusters and Multiclusters," Proc. Ann. Int'l Conf. Supercomputing (ICS), 2006.
[4] Z. Ding, R. Hoare, A. Jones, D. Li, S. Shao, S. Tung, J. Zheng, and R. Melhem, "Switch Design to Enable Predictive Multiplexed Switching," Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), p. 100a, 2005.
[5] R. Preissl, T. Köckerbauer, M. Schulz, D. Kranzlmüller, B.R. de Supinski, and D.J. Quinlan, "Detecting Patterns in MPI Communication Traces," Proc. Int'l Conf. Parallel Processing (ICPP), pp. 230-237, 2008.
[6] J.S. Vetter and F. Mueller, "Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), pp. 853-865, 2002.
[7] Intel Ltd., "Intel Trace Analyzer & Collector," com/cd/software/products/ asmo-na/eng244171.htm, 2011.
[8] W.E. Nagel, A. Arnold, M. Weber, H.C. Hoppe, and K. Solchenbach, "VAMPIR: Visualization and Analysis of MPI Resources," Supercomputer, vol. 12, no. 1, pp. 69-80, Jan. 1996.
[9] S. Shende and A.D. Malony, "TAU: The Tau Parallel Performance System," Int'l J. High Performance Computing Applications, vol. 20, no. 2, pp. 287-331, 2006.
[10] B. Mohr and F. Wolf, "KOJAK-A Tool Set for Automatic Performance Analysis of Parallel Programs," Proc. Int'l Euro-Par Conf., 2003.
[11] D.J. Kerbyson, H.J. Alme, A. Hoisie, F. Petrini, H.J. Wasserman, and M. Gittings, "Predictive Performance and Scalability Modeling of a Large-Scale Application," Proc. ACM/IEEE Conf. Supercomputing, pp. 37-48, 2001.
[12] D. Bailey, T. Harris, W. Saphir, R.V.D. Wijngaart, A. Woo, and M. Yarrow, The NA Parallel Benchmarks 2.0., NAS Systems Division, NASA Ames Research Center, 1995.
[13] R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G.M. Voelker, "MPIWiz: Subgroup Reproducible Replay of mpi Applications," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), pp. 251-260, 2009.
[14] M. Weiser, "Program Slicing," IEEE Trans. Software Eng., vol. 10, no. 4, pp. 352-357, 1984.
[15] A.W. Appel, Modern Compiler Implementation in C: Basic Techniques. Cambridge Univ. Press, 1997.
[16] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., 1986.
[17] J. Ferrante, K.J. Ottenstein, and J.D. Warren, "The Program Dependence Graph and Its Use in Optimization," ACM Trans. Programming Languages and Systems, vol. 9, no. 3, pp. 319-349, 1987.
[18] J. Banning, "An Efficient Way to Find Side Effects of Procedure Calls and Aliases of Variables," Proc. ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages (POPL), pp. 29-41, 1979.
[19] LLNL, "ASCI Purple Benchmark," computing_resources/ purple/archivebenchmarks, 2011.
[20] Argonne Nat'l Laboratory, "MPICH2," http://www.mcs.anl. gov/research/projects mpich2, 2011.
[21] Ohio State Univ., "MVAPICH: MPI over Infiniband and iWARP," http:/, 2011.
[22] M. Noeth, F. Mueller, M. Schulz, and B.R. de Supinski, "Scalable Compression and Replay of Communication Traces in Massively Parallel Environments," Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), 2007.

Index Terms:
Communication pattern, communication trace, message-passing program, parallel application.
Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng, "Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 11, pp. 1862-1870, Nov. 2011, doi:10.1109/TPDS.2011.49
Usage of this product signifies your acceptance of the Terms of Use.