This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Software Trace Cache
January 2005 (vol. 54 no. 1)
pp. 22-35
This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance: the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized codes have some special characteristics that make them more amenable for high-performance instruction fetch: They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.

[1] J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.-T.A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl, “Continuous Profiling: Where Have All the Cycles Gone?” Technical Report 1997-16, Compaq Systems Research Lab., July 1997.
[2] T. Ball and J.R. Larus, “Efficient Path Profiling,” Proc. 29th Ann. ACM/IEEE Int'l Symp. Microarchitecture, Dec. 1996.
[3] L.A. Barroso, K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. 16th Ann. Int'l Symp. Computer Architecture, pp. 3-14, June 1998.
[4] B. Calder and D. Grunwald, “Reducing Branch Costs via Branch Alignment,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 242-251, Oct. 1994.
[5] R. Cohn, D. Goodwin, P.G. Lowney, and N. Rubin, “Spike: An Optimizer for Alpha/NT Executables,” USENIX, pp. 17-23, Aug. 1997.
[6] T. Conte, K. Menezes, P. Mills, and B. Patell, “Optimization of Instruction Fetch Mechanism for High Issue Rates,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 333-344, June 1995.
[7] J.A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Trans. Computers, vol. 30, no. 7, pp. 478-490, July 1981.
[8] J.A. Fisher and S.M. Freudenberger, “Predicting Conditional Branch Directions from Previous Runs of a Program,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 85-95, 1992.
[9] D.H. Friendly, S.J. Patel, and Y.N. Patt, “Alternative Fetch and Issue Techniques from the Trace Cache Mechanism,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, Dec. 1997.
[10] N. Gloy, T. Blackwell, M.D. Smith, and B. Calder, “Procedure Placement Using Temporal Ordering Information,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 303-313, Dec. 1997.
[11] A.H. Hashemi, D.R. Kaeli, and B. Calder, “Efficient Procedure Mapping Using Cache Line Coloring,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 171-182, June 1997.
[12] D.L. Howard and M.H. Lipasti, “The Effect of Program Optimization on Trace Cache Performance,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 256-261, Oct. 1999.
[13] W.-M. Hwu and P.P. Chang, “Achieving High Instruction Cache Performance with an Optimizing Compiler,” Proc. 16th Ann. Int'l Symp. Computer Architecture, pp. 242-251, June 1989.
[14] J. Kalamatianos and D.R. Kaeli, “Temporal-Based Procedure Reordering for Improved Instruction Cache Performance,” Proc. Fourth Int'l Conf. High Performance Computer Architecture, Feb. 1998.
[15] C.-C. Lee, I-C.K. Chen, and T.N. Mudge, “The Bi-Mode Branch Predictor,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 4-13, Dec. 1997.
[16] S. McFarling, “Combining Branch Predictors,” Technical Report TN-36, Compaq Western Research Lab., June 1993.
[17] P. Michaud, A. Seznec, and R. Uhlig, “Trading Conflict and Capacity Aliasing in Conditional Branch Predictors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 292-303, 1997.
[18] R. Muth, “Alto: A Platform for Object Code Modification,” PhD dissertation, Univ. of Arizona, Aug. 1999.
[19] S.J. Patel, D.H. Friendly, and Y.N. Patt, “Critical Issues Regarding the Trace Cache Fetch Mechanism,” Technical Report CSE-TR-335-97, Univ. of Michigan, May 1997.
[20] K. Pettis and R.C. Hansen, “Profile Guided Code Positioning,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 16-27, June 1990.
[21] A. Ramirez, L. Barroso, K. Gharachorloo, R. Cohn, J.L. Larriba-Pey, G. Lawney, and M. Valero, “Code Layout Optimizations for Transaction Processing Workloads,” Proc. 28th Ann. Int'l Symp. Computer Architecture, July 2001.
[22] A. Ramirez, J.L. Larriba-Pey, and M. Valero, “The Effect of Code Reordering on Branch Prediction,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 189-198, Oct. 2000.
[23] A. Ramirez, J.L. Larriba-Pey, C. Navarro, X. Serrano, J. Torrellas, and M. Valero, “Optimization of Instruction Fetch for Decision Support Workloads,” Proc. Int'l Conf. Parallel Processing, pp. 238-245, Sept. 1999.
[24] A. Ramirez, J.L. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero, “Software Trace Cache,” Proc. 13th Int'l Conf. Supercomputing, June 1999.
[25] A. Ramirez, O.J. Santana, J.L. Larriba-Pey, and M. Valero, “Fetching Instruction Streams,” Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 2002.
[26] M. Rosenblum, E. Bugnion, S.A. Herrod, and S. Devine, “Using the Simos Machine Simulator to Study Complex Computer Systems,” ACM Trans. Modeling and Computer Simulation, vol. 7, no. 1, pp. 78-103, Jan. 1997.
[27] E. Rotenberg, S. Benett, and J.E. Smith, “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Proc. 29th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 24-34, Dec. 1996.
[28] A. Seznec and P. Michaud, “D-Aliased Hybrid Branch Predictors,” Technical Report PI-1229, IRISA, Feb. 1999.
[29] J.E. Smith, “A Study of Branch Prediction Strategies,” Proc. Eighth Ann. Int'l Symp. Computer Architecture, pp. 135-148, 1981.
[30] E. Sprangle, R.S. Chappell, M. Alsup, and Y.N. Patt, “The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 284-291, 1997.
[31] A. Srivastava and D.W. Wall, “A Practical System for Intermodule Code Optimization at Link-Time,” J. Programming Languages, vol. 1, no. 1, pp. 1-18, Dec. 1992.
[32] J. Torrellas, C. Xia, and R. Daigle, “Optimizing Instruction Cache Performance for Operating System Intensive Workloads,” Proc. First Int'l Conf. High Performance Computer Architecture, pp. 360-369, Jan. 1995.
[33] T.-Y. Yeh, D.T. Marr, and Y.N. Patt, “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache,” Proc. Seventh Int'l Conf. Supercomputing, pp. 67-76, July 1993.
[34] T.-Y. Yeh and Y.N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 124-134, 1992.
[35] T.-Y. Yeh and Y.N. Patt, “A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 257-266, 1993.

Index Terms:
Pipeline processors, instruction fetch, compiler optimizations, branch prediction, trace cache.
Citation:
Alex Ramirez, Josep L. Larriba-Pey, Mateo Valero, "Software Trace Cache," IEEE Transactions on Computers, vol. 54, no. 1, pp. 22-35, Jan. 2005, doi:10.1109/TC.2005.13
Usage of this product signifies your acceptance of the Terms of Use.