The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - Feb. (2013 vol.62)
pp: 376-389
Dehao Chen , Google Inc, Mountain View
Neil Vachharajani , Pure Storage Inc, Mountain View
Robert Hundt , Google Inc, Mountain View
Xinliang Li , Google Inc, Mountain View
Stephane Eranian , Google Inc, Mountain View
Wenguang Chen , Tsinghua University, Beijing
Weimin Zheng , Tsinghua University, Beijing
ABSTRACT
Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling overcomes these drawbacks by providing a lightweight approach to collect execution profiles in the production environment, which naturally consumes representative input. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO. In this paper, we use Performance Monitoring Unit (PMU)-based sampling to collect the instruction frequency profiles. By collecting profiles using multiple events, and applying heuristics to predict the accuracy, we improve the accuracy of the profile. We also show how emerging techniques can be used to further improve the accuracy of the sample-based profile. Additionally, these emerging techniques are used to collect value profiles, as well as to assist a lightweight interprocedural optimizer. All these profiles are represented in a portable form, thus they can be used across different platforms. We demonstrate that sampling-based FDO can achieve an average of 92 percent of the performance gains obtained using instrumentation-based exact profiles for both SPEC CINT2000 and CINT2006 benchmarks. The overhead of collection is only 0.93 percent on average, while compiler-based instrumentation incurs 2.0-351.5 percent overhead (and 10x overhead on an industrial web search application).
INDEX TERMS
Instruments, Radiation detectors, Optimization, Hardware, Phasor measurement units, Monitoring, Program processors, last branch record, Sample profile, feedback directed optimization, performance counter
CITATION
Dehao Chen, Neil Vachharajani, Robert Hundt, Xinliang Li, Stephane Eranian, Wenguang Chen, Weimin Zheng, "Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations", IEEE Transactions on Computers, vol.62, no. 2, pp. 376-389, Feb. 2013, doi:10.1109/TC.2011.233
REFERENCES
[1] A.-R. Adl-Tabatabai, R.L. Hudson, M.J. Serrano, and S. Subramoney, “Prefetch Injection Based on Hardware Monitoring and Object Metadata,” Proc. ACM SIGPLAN 2004 Conf. Programming Language Design and Implementation (PLDI '04), pp. 267-276, 2004.
[2] G. Ammons, T. Ball, and J.R. Larus, “Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '97), pp. 85-96, 1997.
[3] J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.-T. A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl, “Continuous Profiling: Where Have All the Cycles Gone?,” ACM Trans. Computer Systems, vol. 15, no. 4, pp. 15-4, 1997.
[4] M. Arnold and B.G. Ryder, “A Framework for Reducing the Cost of Instrumented Code,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '01), pp. 168-179, 2001.
[5] T. Ball and J.R. Larus, “Optimally Profiling and Tracing Programs,” ACM Trans. Programming Languages and Systems, vol. 16, no. 4, pp. 1319-1360, 1994.
[6] T. Ball and J.R. Larus, “Efficient Path Profiling,” Proc. ACM/IEEE 29th Ann. Int'l Symp. Microarchitecture (MICRO 29), pp. 46-57, 1996.
[7] M. Burrows, U. Erlingsson, S.-T.A. Leung, M.T. Vandevoorde, C.A. Waldspurger, K. Walker, and W.E. Weihl, “Efficient and Flexible Value Sampling,” Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 160-167, 2000.
[8] B. Calder, P. Feller, and A. Eustace, “Value Profiling,” Proc. ACM/IEEE 30th Ann. Int'l Symp. Microarchitecture (MICRO 30), pp. 259-269, 1997.
[9] D. Chen, N. Vachharajani, R. Hundt, S.-W. Liao, V. Ramasamy, P. Yuan, W. Chen, and W. Zheng, “Taming Hardware Event Samples for FDO Compilation,” Proc. IEEE/ACM Eighth Ann. Int'l Symp. Code Generation and Optimization (CGO '10), pp. 42-52, 2010.
[10] T.M. Conte, B.A. Patel, K.N. Menezes, and J.S. Cox, “Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization,” Int'l J. Parallel Processing, vol. 24, no. 2, pp. 187-206, 1996.
[11] Intel Corporation, Vol. 3B: System Programming Guide, Part 2, 2008.
[12] J. Dean, J.E. Hicks, C.A. Waldspurger, W.E. Weihl, and G. Chrysos, “Profileme: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors,” Proc. ACM/IEEE 30th Ann. Int'l Symp. Microarchitecture (MICRO 30), pp. 292-302, 1997.
[13] P.J. Drongowski, “Instruction-Based Sampling: A New Performance Analysis Technique for amd Family 10h Processors,” technical report, Advanced Micro Devices, Inc., Nov. 2007.
[14] N. Froyd, J. Mellor-Crummey, and R. Fowler, “Low-Overhead Call Path Profiling of Unmodified, Optimized Code,” Proc. 19th Ann. Int'l Conf. Supercomputing (ICS '05), pp. 81-90, 2005.
[15] F. Gabbay and A. Mendelson, “Can Program Profiling Support Value Prediction?” Proc. ACM/IEEE 30th Ann. Int'l Symp. Microarchitecture (MICRO 30), pp. 270-280, 1997.
[16] N. Gloy, Z. Wang, C. Zhang, J. Bradley Chen, and M.D. Smith, “Profile-Based Optimization with Statistical Profiles,” technical report, Harvard Univ., Apr. 1997.
[17] R. Levin, I. Newman, and G. Haber, “Complementing Missing and Inaccurate Profiling Using a Minimum Cost Circulation Algorithm,” Proc. Third Int'l Conf. High Performance Embedded Architectures and Compilers (HiPEAC '08), pp. 291-304, 2008.
[18] D.X. Li, R. Ashok, and R. Hundt, “Lightweight Feedback-Directed Cross-Module Optimization,” Proc. IEEE/ACM Eighth Ann. Int'l Symp. Code Generation and Optimization (CGO '10), pp. 53-61, 2010.
[19] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood, “Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '05), pp. 190-200, 2005.
[20] M.C. Merten, A.R. Trick, C.N. George, J.C. Gyllenhaal, and W.-M.W. Hwu, “A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA '99), pp. 136-147, 1999.
[21] T. Mytkowicz, A. Diwan, M. Hauswirth, and P.F. Sweeney, “Evaluating the Accuracy of Java Profilers,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '10), pp. 187-197, 2010.
[22] V. Ramasamy, P. Yuan, D. Chen, and R. Hundt, “Feedback-Directed Optimization in GCC with Estimated Edge Profiles from Hardware Event Sampling,” Proc. GCC Developers' Summit, 2008.
[23] F.T. Schneider, M. Payer, and T.R. Gross, “Online Optimizations Driven by Hardware Performance Monitoring,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '07), pp. 373-382, 2007.
[24] N.R. Tallent, J.M. Mellor-Crummey, and A. Porterfield, “Analyzing Lock Contention in Multithreaded Applications,” Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 269-280, 2010.
[25] O. Traub, S. Schechter, and M.D. Smith, “Ephemeral Instrumentation for Lightweight Program Profiling,” technical report, Harvard Univ., June 2000.
[26] V.M. Weaver and S.A. McKee, “Can Hardware Performance Counters be Trusted?,” Proc. IEEE Int'l Symp. Workload Characterization, pp. 141-150, Sept. 2008.
[27] Y. Wu and J.R. Larus, “Static Branch Frequency and Program Profile Analysis,” Proc. 27th Ann. Int'l Symp. Microarchitecture (MICRO 27), pp. 1-11, 1994.
[28] X. Zhang, Z. Wang, N. Gloy, J. Bradley Chen, and M.D. Smith, “System Support for Automatic Profiling and Optimization,” SIGOPS Operating Systems Rev., vol. 31, no. 5, pp. 31-5, 1997.
52 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool