1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425) (1999)
Newport Beach, California
Oct. 12, 1999 to Oct. 16, 1999
Derek L. Howard , IBM Server Group
Mikko H. Lipasti , University of Wisconsin
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves instruction fetch bandwidth. Similarly, program transformations like loop unrolling, procedure in-lining, feedback-directed program restructuring, and profile-directed feedback can improve instruction fetch bandwidth by changing the static structure and ordering of a program's basic blocks. We examine the interaction of these compile-time and run-time techniques in the context of a high-quality production compiler that implements such transformations and a cycle-accurate simulation model of a wide issue super-scalar processor. Not surprisingly, we find that the relative benefit of adding trace cache declines with increasing optimization level, and vice versa. Furthermore, we find that certain optimizations that improve performance on a processor model without trace cache can actually degrade performance on a processor with trace cache due to increased branch history table interference. Finally, we show that the performance obtained with a trace cache of a given size can be obtained with a trace cache of about half the size by applying aggressive compiler optimization techniques.
Microarchitecture, superscalar processors, trace cache, compiler optimization
M. H. Lipasti and D. L. Howard, "The Effect of Program Optimization on Trace Cache Efficiency," 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425)(PACT), Newport Beach, California, 1999, pp. 256.