Issue No.01 - January (2005 vol.54)
Mateo Valero , IEEE
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2005.13
This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance: the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized codes have some special characteristics that make them more amenable for high-performance instruction fetch: They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.
Pipeline processors, instruction fetch, compiler optimizations, branch prediction, trace cache.
Alex Ramirez, Mateo Valero, "Software Trace Cache", IEEE Transactions on Computers, vol.54, no. 1, pp. 22-35, January 2005, doi:10.1109/TC.2005.13