Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques (2001)
Sept. 8, 2001 to Sept. 12, 2001
Roni Rosner , Intel Corporation
Avi Mendelson , Intel Corporation
Ronny Ronen , Intel Corporation
Abstract: The trace cache is becoming an important building block of modern, wide-issue processors. So far, trace cache related research has been focused on increasing fetch bandwidth. Trace-caches have been shown too effectively increase the number of "useful" instructions that can be fetched into the machine, thus enabling more instructions to be executed each cycle. However, trace cache has another important benefit that got less attention in recent research: especially for variable length ISA, such as Intel's IA-32 architecture (X86), reducing instruction decoding power is particularly attractive. Keeping the instruction traces in decoded format, implies the decoding power is only paid upon the build of a trace, thus reducing the overall power consumption of the system. This paper has three main contributions: it indicates that trace cache optimizations directed to reducing power consumption arc do not necessarily coincide with optimizations directed to increasing fetch bandwidth; it extends our understanding on how well the trace cache utilizes its resources and introduces a new trace-cache organization based on filtering techniques. The knowledge obtained from the analysis of the traces' behavioral patterns motivates the use of filtering techniques. The new trace-cache organization increases the effective instruction-fetch bandwidth in conjunction with reducing the power consumption of the trace-cache system. We observe that (1) the majority of traces that are inserted into the trace-cache are rarely used again before being replaced; (2) the majority of the instructions delivered for execution originate from the fewer traces that are heavily and repeatedly used; and that (3) techniques that aim to improve instruction-fetch bandwidth may increase the number of traces built during program execution. Based on these observations, we propose splitting the trace cache into two components: the filter trace-cache (FTC) and the main trace-cache (MTC). Traces are first inserted into the FTC that is used to filter out the infrequently used traces; traces that prove "useful" are later moved into the MTC itself. The FTC/MTC organization exhibits an important benefit: it decreases the number of traces built, thus reducing power consumption while improving overall performance. For medium-size applications, the FTC/MTC pair reduces the number of trace builds by 16% in average. As extension of the filtering concept that involves adding a second level (L2) trace-cache that stores less frequent traces that are replaced in the FTC or the MTC. The extra level of caching allows for order-of-magnitude reduction in the number of trace builds. Second level trace cache proves particularly useful for applications with large instruction footprints.
R. Ronen, A. Mendelson and R. Rosner, "Filtering Techniques to Improve Trace-Cache Efficiency," Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques(PACT), Barcelona, Spain, 2001, pp. 0037.