This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Systematic Approach toward Automated Performance Analysis and Tuning
March 2012 (vol. 23 no. 3)
pp. 426-435
Guojing Cong, IBM T.J. Watson Research Center, Yorktown Heights
I-Hsin Chung, IBM T.J. Watson Research Center, Yorktown Heights
Hui-Fang Wen, IBM T.J. Watson Research Center, Yorktown Heights
David Klepacki, IBM T.J. Watson Research Center, Yorktown Heights
Hiroki Murata, IBM Research - Tokyo, Japan
Yasushi Negishi, IBM Research - Tokyo, Japan
Takao Moriyama, IBM Research, Yorktown Heights
High productivity is critical in harnessing the power of high-performance computing systems to solve science and engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations. Despite significant progress in programming language, compiler, and performance tools, tuning an application remains largely a manual task, and is done mostly by experts. In this paper, we propose a systematic approach toward automated performance analysis and tuning that we expect to improve the productivity of performance debugging significantly. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance improvement is significant.

[1] L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N.R. Tallent, "Hpctoolkit: Tools for Performance Analysis of Optimized Parallel Programs," Concurrency and Computation: Practice and Experience, vol. 22, pp. 685-701, http:/hpctoolkit.org., Apr. 2010.
[2] C. Bastoul, "Code Generation in the Polyhedral Model Is Easier than You Think," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '04), pp. 7-16, Sept. 2004.
[3] A. Bhatele and G. Cong, "A Selective Profiling Tool: Towards Automatic Performance Tuning," Proc. Third Workshop System Management Techniques, Processes and Services (SMTPS '07), Mar. 2007.
[4] M. Burtscher, B.-D. Kim, J. Diamond, J. McCalpin, L. Koesterke, and J. Browne, "Perfexpert: An Easy-to-Use Performance Diagnosis Tool for Hpc Applications," Proc. ACM/IEEE Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC '10), pp. 1-11, 2010.
[5] A. Chandramowlishwarany, K. Madduri, and R. Vuduc, "Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method," Proc. ACM/IEEE Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC '10), pp. 1-12, 2010.
[6] C. Chen, J. Chame, and M.W. Hall, "CHiLL: A Framework for Composing High-Level Loop Transformations," technical report, Univ. of Southern California, 2008.
[7] W. Chen et al., "Using Profile Information to Assist Advanced Compiler Optimization and Scheduling," Advances in Languages and Compilers for Parallel Processing, vol. 757, pp. 31-48, Jan. 1993.
[8] G. Cong, I-H. Chung, H. Wen, D. Klepacki, H. Murata, Y. Negishi, and T. Moriyama, "A Holistic Approach towards Automated Performance Analysis and Tuning," Proc. 15th Int'l Euro-Par Conf. Parallel Processing, pp. 33-44, 2009.
[9] C. Ţăpuş, I-H. Chung, and J.K. Hollingsworth, "Active Harmony: towards Automated Performance Tuning," Proc. ACM/IEEE Conf. Supercomputing (Supercomputing '02), pp. 1-11, 2002.
[10] L. DeRose, K. Ekanadham, J.K. Hollingsworth, and S. Sbaraglia, "SIGMA: A Simulator Infrastructure to Guide Memory Analysis," Proc. ACM/IEEE Conf. Supercomputing (Supercomputing '02), pp. 1-13, 2002.
[11] J.H. Ferziger and M. Peric, Computational Methods for Fluid Dynamics, third ed. Springer-Verlag, 2002.
[12] M. Geimer, F. Wolf, B.J.N. Wylie, E. Abraham, D. Becker, and B. Mohr, "The SCALASCA Performance Toolset Architecture," Proc. Int'l Workshop Scalable Tools for High-End Computing (STHEC), 2008.
[13] M. Gerndt and M. Ott, "Automatic Performance Analysis with Periscope," Concurrency and Computation: Practice and Experience, vol. 22, pp. 736-748, Apr. 2010.
[14] A. Hartono, B. Norris, and P. Sadayappan, "Annotation-Based Empirical Performance Tuning Using Orio," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS), pp. 1-11, 2009.
[15] IBM High Productivity Computing Systems Toolkit, http://www.alphaworks.ibm.com/techhpcst, 2011.
[16] A. MacNab, G. Vahala, P. Pavlo, L. Vahala, and M. Soe, "Lattice Boltzmann Model for Dissipative Incompressible MHD," Proc. 28th EPS Conf. Controlled Fusion and Plasma Physics, vol. 25A, pp. 853-856, 2001.
[17] A.D. Malony, S. Shende, R. Bell, K. Li, L. Li, and N. Trebon, "Advances in the Tau Performance System," Performance Analysis and Grid Computing, pp. 129-144, Kluwer Academic Publishers, 2004.
[18] B.P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B Irvin, K.L. Karavanic, K. Kunchithapadam, and T. Newhal, "The Paradyn Parallel Performance Measurement Tool," Computer, vol. 28, no. 11, pp. 37-46, Nov. 1995.
[19] V. Pillet, J. Labarta, T. Cortes, and S. Girona, "PARAVER: A Tool to Visualise and Analyze Parallel Code," Proc. WoTUG-18: Transputer and occam Developments, vol. 44, pp. 17-31, 1995.
[20] C.A. Schaefer, V. Pankratius, and W.F. Tichy, "Engineering Parallel Applications with Tunable Architectures," Proc. 32nd ACM/IEEE Int'l Conf. Software Eng. (ICSE '10), vol. 1, pp. 405-414, 2010.
[21] M. Schordan and D. Quinlan, "A Source-to-Source Architecture for User-Defined Optimizations," Proc. Joint Modular Languages Conf., pp. 214-223, 2003.
[22] R. Vuduc, J. Demmel, and K. Yelick, "OSKI: A Library of Automatically Tuned Sparse Matrix Kernels," Proc. SciDAC 2005, J. Physics: Conf. Series, 2005.
[23] H. Wen, S. Sbaraglia, S. Seelam, I. Chung, G. Cong, and D. Klepacki, "A Productivity Centered Tools Framework for Application Performance Tuning," QEST '07: Proc. Fourth Int'l Conf. Quantitative Evaluation of Systems, pp. 273-274, 2007.
[24] R. Whaley and J. Dongarra, "Automatically Tuned Linear Algebra Software (ATLAS)," Proc. Int'l Conf. Supercomputing (Supercomputing '98), www.netlib.org/utk/people/JackDongarra/PAPERS atlas-sc98.ps. Nov. 1998.

Index Terms:
Performance tuning, performance tool.
Citation:
Guojing Cong, I-Hsin Chung, Hui-Fang Wen, David Klepacki, Hiroki Murata, Yasushi Negishi, Takao Moriyama, "A Systematic Approach toward Automated Performance Analysis and Tuning," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 3, pp. 426-435, March 2012, doi:10.1109/TPDS.2011.189
Usage of this product signifies your acceptance of the Terms of Use.