This Article 
 Bibliographic References 
 Add to: 
An Analytic Framework for Detailed Resource Profiling in Large and Parallel Programs and Its Application for Memory Use
March 2010 (vol. 59 no. 3)
pp. 358-370
Ulrich Finkler, IBM T.J. Watson Research Center, Yorktown Heights
Profiling is an essential and widely used technique to understand the resource use of applications. For example, the memory use of large applications is becoming an important cost factor. Very large systems are typically sized to accommodate designated tasks, and thus, the price, as well as cache and TLB efficiency, depends significantly on the memory footprint of the target applications. Importantly, the increasing use of multicore systems magnifies the problem since memory use grows with the number of parallel tasks. Additionally, the presence of multiple tasks or threads makes the problem of correlating resource use to the program structure harder. Thus, tools that correlate resource use with program structure with quantitative error margins are essential for optimizing the resource use of complex software applications. While efficient tools for the profiling of execution time are available, the choices for detailed profiling of memory use or other hardware resources are very limited. We were unable to find tools that provided sufficiently accurate insight into, e.g., memory use without adding unacceptable overhead in memory use and execution time for the performance analysis of very large applications. In this paper, we present a highly efficient probabilistic method for profiling that provides detailed resource usage information R_{\Psi }(t) indexed by the full location descriptor \Psi (e.g., process id, thread id, and call chain) and time t. Importantly, we provide an analytical framework, which provides error estimates and allows to analyze and quantitatively optimize a wide variety of profiling scenarios. We employed the probabilistic approach to implement a memory profiling tool that adds minimal overhead and does not require recompilation or relinking. The tool provides the memory use M_\psi (t) for all location descriptors \psi over the execution time for single and multithreaded programs. Experimental results confirm that execution time and memory overhead are less than 10 percent of the unprofiled, optimized execution. Importantly, the technique is sufficiently general to be applicable to profiling of other hardware resources as cache or TLB misses over time for all location descriptors with similarly low overhead and across multiple processes, threads, and processors.

[1] S.L. Graham, P.B. Kessler, and M.K. McKusick, “gprof: A Call Graph Execution Profiler,” Proc. SIGPLAN Symp. Compiler Construction, 1982.
[2] vtunev8/, 2008.
[3] J.M. Spivey, “Fast, Accurate Call Graph Profiling,” Software: Practice and Experience, vol. 34, no. 3, pp. 249-264, Mar. 2004.
[4] R.K. Treiber, Research Report RC 9932 (#44037), IBM San Jose Research Laboratory, 1983.
[5] R.J. Hall, “Call Path Profiling,” Proc. 14th Int'l Conf. Software Eng. (ICSE '92), June 1992.
[6] Valgrind, a GPL'd System for Debugging and Profiling x86-Linux Programs, http:/, 2008.
[7] M. Hauswirth and T.M. Chilimbi, “Low-Overhead Memory Leak Detection Using Adaptive Statistical Profiling,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI), Oct. 2004.
[8] M. Martonosi, A. Gupta, and T. Anderson, “MemSpy: Analyzing Memory System Bottlenecks in Programs,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 1-12, May 1992.
[9] M. Itzkowitz, B.J.N. Wylie, C. Aoki, and N. Kosche, “Memory Profiling Using Hardware Counters,” Proc. ACM/IEEE Conf. Supercomputing, Nov. 2003.
[10] J. Gyllenhaal, “Using Zerofault to Find Memory Errors and Leaks in Large Parallel AIX Applications,” Livermore Computing, Lawrence Livermore Nat'l Laboratory, ZeroFault.8-23-02.pdf, 2008.
[11] The Zerofault Group, http://www.zerofault.comzf/, 2009.
[12] Rational Purify, purify/, 2009.
[13] Z. Gangh and D. Xin, Tool Evaluation of Rational Purify, homework example-tool-eval2.pdf, 2008.
[14], 2009.
[15] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C. Cambridge Univ. Press, 1992.
[16] H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods. Soc. for Industrial and Applied Math., 1992.
[17] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms. MIT Press. 2001.
[18] R. Sedgewick, Algorithms, second ed. Addison-Wesley, 1988.
[19] D.R. Morrison, “PATRICIA—Practical Algorithm to Retrieve Information Coded in Alphanumeric,” J. ACM, vol. 15, no. 4, pp.514-534, Oct. 1968.
[20] C.J. Stephenson, “Fast Fits: New Methods for Dynamic Storage Allocations,” Proc. Ninth Symp. Operating System Principles, Oct. 1983.
[21] D.E. Knuth, The Art of Computer Programming: Fundamental Algorithms, vol. 1, Addison Wesley, 1973.
[22] A. Aho, J. Hopcraft, and J. Ullmann, Data Structures and Algorithms. Addison Wesley, 1983.
[23] C.J. Stephenson, “Fast Fit: New Methods for Dynamic Storage Allocation,” Proc. Ninth Symp. Operating Systems Principles (SOSP), Oct. 1983.
[24] L. DeRose, K. Ekanadham, J.K. Hollingsworth, and S. Sbaraglia, “SIGMA: A Simulator Infrastructure to Guide Memory Analysis,” Proc. Conf. Supercomputing, comm/research_projects.nsf/ pagesactc.sigma2.html , 2002.
[25] X. Gao, M. Laurenzano, B. Simon, and A. Snavely, “Reducing Overheads for Dynamic Memory Traces,” Proc. Int'l Symp. Workload Characterization (ISWC '05). Oct. 2005.
[26] C. Pereira, J. Lau, B. Calder, and R. Gupta, “Dynamic Phase Analysis for Cycle-Close Trace Generation,” Proc. Int'l Conf. Hardware/Software Codesign and System Synthesis, Sept. 2005.
[27] X. Gao, A. Snavely, and L. Carter, “Path Grammar Guided Trace Compression and Trace Approximation,” Proc. 15th IEEE Int'l Symp. High Performance Distributed Computing (HPDC-15). 2006.

Index Terms:
Resource usage, memory usage, profiling, call chain, probabilistic, numerical.
Ulrich Finkler, "An Analytic Framework for Detailed Resource Profiling in Large and Parallel Programs and Its Application for Memory Use," IEEE Transactions on Computers, vol. 59, no. 3, pp. 358-370, March 2010, doi:10.1109/TC.2009.149
Usage of this product signifies your acceptance of the Terms of Use.