The Community for Technology Leaders
RSS Icon
Issue No.09 - September (2009 vol.58)
pp: 1153-1170
José A. Joao , University of Texas at Austin, Austin
Onur Mutlu , Microsoft Research, Redmond
Chang Joo Lee , University of Texas at Austin, Austin
Yale N. Patt , University of Texas at Austin, Austin
Hyesoon Kim , Georgia Institute of Technology, Atlanta
Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual-machine-based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that of conditional branches. Furthermore, previously proposed indirect branch predictors usually require a significant amount of extra hardware storage and complexity, which makes them less attractive to implement. This paper proposes a new technique for handling indirect branches, called Virtual Program Counter (VPC) prediction. The key idea of VPC prediction is to use the existing conditional branch prediction hardware to predict indirect branch targets, avoiding the need for a separate storage structure. Our comprehensive evaluation shows that VPC prediction improves average performance by 26.7 percent and reduces average energy consumption by 19 percent compared to a commonly used branch target buffer based predictor on 12 indirect branch intensive C/C{++} applications. Moreover, VPC prediction improves the average performance of the full set of object-oriented Java DaCapo applications by 21.9 percent, while reducing their average energy consumption by 22 percent. We show that VPC prediction can be used with any existing conditional branch prediction mechanism and that the accuracy of VPC prediction improves when a more accurate conditional branch predictor is used.
Indirect branch prediction, virtual functions, devirtualization, object-oriented languages, Java.
José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, Hyesoon Kim, "Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware", IEEE Transactions on Computers, vol.58, no. 9, pp. 1153-1170, September 2009, doi:10.1109/TC.2008.227
[1] Advanced Micro Devices, Inc., AMD Athlon(TM) XP Processor Model 10 Data Sheet, Feb. 2003.
[2] Advanced Micro Devices, Inc., Software Optimization Guide for AMD Family 10h Processors, Apr. 2008.
[3] S. Bhansali , W.-K. Chen , S.D. Jong , A. Edwards , M. Drinic , D. Mihocka , and J. Chau , “Framework for Instruction-Level Tracing and Analysis of Programs,” Proc. Second Int'l Conf. Virtual Execution Environments (VEE '06), 2006.
[4] S.M. Blackburn , R. Garner , C. Hoffman , A.M. Khan , K.S. McKinley , R. Bentzur , A. Diwan , D. Feinberg , D. Frampton , S.Z. Guyer , M. Hirzel , A. Hosking , M. Jump , H. Lee , J.E.B. Moss , A. Phansalkar , D. Stefanović , T. VanDrunen , D. von Dincklage , and B. Wiedermann , “The DaCapo Benchmarks: Java Benchmarking Development and Analysis,” Proc. 21st Ann. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '06), 2006.
[5] D. Brooks , V. Tiwari , and M. Martonosi , “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), 2000.
[6] B. Calder and D. Grunwald , “Reducing Indirect Function Call Overhead in C++ Programs,” Proc. 21st ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages (POPL '94), 1994.
[7] B. Calder , D. Grunwald , and B. Zorn , “Quantifying Behavioral Differences between C and C++ Programs,” J. Programming Languages, vol. 2, no. 4, pp. 323-351, 1995.
[8] L. Cardelli and P. Wegner , “On Understanding Types, Data Abstraction, and Polymorphism,” ACM Computing Surveys, vol. 17, no. 4, pp. 471-523, Dec. 1985.
[9] P.-Y. Chang , M. Evers , and Y.N. Patt , “Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference,” Proc. Conf. Parallel Architectures and Compilation Techniques (PACT '96), 1996.
[10] P.-Y. Chang , E. Hao , and Y.N. Patt , “Target Prediction for Indirect Jumps,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA '24), 1997.
[11] L.P. Deutsch and A.M. Schiffman , “Efficient Implementation of the Smalltalk-80 System,” Proc. Symp. Principles of Programming Languages (POPL '84), 1984.
[12] K. Driesen and U. Hölzle , “Accurate Indirect Branch Prediction,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA '98), 1998.
[13] K. Driesen and U. Hölzle , “The Cascaded Predictor: Economical and Adaptive Branch Target Prediction,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '31), 1998.
[14] K. Driesen and U. Hölzle , “Multi-Stage Cascaded Prediction,” Proc. European Conf. Parallel Processing, 1999.
[15] M.A. Ertl and D. Gregg , “Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters,” Proc. Conf. Programming Language Design and Implementation (PLDI '03), 2003.
[16] M. Evers , S.J. Patel , R.S. Chappell , and Y.N. Patt , “An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA '25), 1998.
[17] The GAP Group, GAP System for Computational Discrete Algebra, http:/, 2007.
[18] C. Garrett , J. Dean , D. Grove , and C. Chambers , “Measurement and Application of Dynamic Receiver Class Distributions,” Technical Report UW-CS 94-03-05, Univ. of Washington, Mar. 1994.
[19] GCC-4.0. GNU Compiler Collection, http:/, 2007.
[20] S. Gochman , R. Ronen , I. Anati , A. Berkovits , T. Kurts , A. Naveh , A. Saeed , Z. Sperber , and R.C. Valentine , “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Technology J., vol. 7, no. 2, May 2003.
[21] D. Grove , J. Dean , C. Garrett , and C. Chambers , “Profile-Guided Receiver Class Prediction,” Proc. Tenth Ann. Conf. Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '95), 1995.
[22] G. Hinton , D. Sager , M. Upton , D. Boggs , D. Carmean , A. Kyker , and P. Roussel , “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Feb. 2001, Q1 2001 Issue.
[23] U. Hölzle , C. Chambers , and D. Ungar , “Optimizing Dynamically-Typed Object-Oriented Languages with Polymorphic Inline Caches,” Proc. European Conf. Object-Oriented Programming (ECOOP '91), 1991.
[24] U. Hölzle and D. Ungar , “Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '94), 1994.
[25] Intel Corporation, ICC 9.1 for Linux, asmo-na/eng/compilers284264.htm, 2007.
[26] Intel Corporation, Intel Core Duo Processor T2500, , 2007.
[27] Intel Corporation, Intel VTune Performance Analyzers,, 2007.
[28] K. Ishizaki , M. Kawahito , T. Yasue , H. Komatsu , and T. Nakatani , “A Study of Devirtualization Techniques for a Java Just In-Time Compiler,” Proc. 15th ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '00), 2000.
[29] D.A. Jiménez and C. Lin , “Dynamic Branch Prediction with Perceptrons,” Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA '00), 2001.
[30] J.A. Joao , O. Mutlu , H. Kim , R. Agarwal , and Y.N. Patt , “Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps,” Proc. 13th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '08), 2008.
[31] J.A. Joao , O. Mutlu , H. Kim , and Y.N. Patt , “Dynamic Predication of Indirect Jumps,” IEEE Computer Architecture Letters, May 2007.
[32] D. Kaeli and P. Emma , “Branch History Table Predictions of Moving Target Branches due to Subroutine Returns,” Proc. 18th Ann. Int'l Symp. Computer Architecture (ISCA '91), 1991.
[33] J. Kalamatianos and D.R. Kaeli , “Predicting Indirect Branches via Data Compression,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '98), 1998.
[34] R.E. Kessler , “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24-36, Mar./Apr. 1999.
[35] H. Kim , J.A. Joao , O. Mutlu , C.J. Lee , Y.N. Patt , and R. Cohn , “VPC Prediction: Reducing the Cost of Indirect Branches via Hardware-Based Dynamic Devirtualization,” Proc. 34th Ann. Int'l Symp. Computer Architecture (ISCA '07), 2007.
[36] H. Kim , J.A. Joao , O. Mutlu , and Y.N. Patt , “Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths,” Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO '06), 2006.
[37] J.K.F. Lee and A.J. Smith , “Branch Prediction Strategies and Branch Target Buffer Design,” Computer, vol. 17, no. 1, Jan. 1984.
[38] C.-K. Luk , R. Cohn , R. Muth , H. Patil , A. Klauser , G. Lowney , S. Wallace , V.J. Reddi , and K. Hazelwood , “Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation,” Proc. Programming Language Design and Implementation (PLDI '05), 2005.
[39] P. Magnusson , M. Christensson , J. Eskilson , D. Forsgren , G. Hallberg , J. Hogberg , F. Larsson , A. Moestedt , and B. Werner , “Simics: A Full System Simulation Platform,” Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[40] T. McDonald , “Microprocessor with Branch Target Address Cache Update Queue,” US patent 7,165,168, 2007.
[41] S. McFarling , “Combining Branch Predictors,” Technical Report TN-36, Digital Western Research Laboratory, June 1993.
[42] Microsoft Research, Bartok Compiler, comact/, 2007.
[43] V. Morrison , “Digging into Interface Calls in the .NET Framework: Stub-Based Dispatch,” 03/13550529.aspx, 2007.
[44] D. Novillo , Personal communication, Mar. 2007.
[45] H. Patil , R. Cohn , M. Charney , R. Kapoor , A. Sun , and A. Karunanidhi , “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation,” Proc. 37th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO '04), 2004.
[46] A. Roth , A. Moshovos , and G.S. Sohi , “Improving Virtual Function Call Target Prediction via Dependence-Based Pre-Computation,” Proc. Int'l Conf. Supercomputing (ICS '99), 1999.
[47] A. Seznec and P. Michaud , “A Case for (Partially) Tagged Geometric History Length Branch Prediction,” J. Instruction-Level Parallelism (JILP), vol. 8, Feb. 2006.
[48] D. Tarditi , Personal communication, Nov. 2006.
[49] J. Tendler , S. Dodson , S. Fields , H. Le , and B. Sinharoy , “POWER4 System Microarchitecture,” IBM Technical White Paper, Oct. 2001.
[50] M. Wolczko , Benchmarking Java with the Richards Benchmark, richardsrichards.html , 2007.
[51] T.-Y. Yeh , D. Marr , and Y.N. Patt , “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and Branch Address Cache,” Proc. Seventh Int'l Conf. Supercomputing (ICS '93), 1993.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool