10th International Symposium on High Performance Computer Architecture (HPCA'04)
Hardware Support for Prescient Instruction Prefetch
Madrid, Spain
February 14-February 18
ISBN: 0-7695-2053-7
This paper proposes and evaluates hardware mechanisms for supporting prescient instruction prefetch — an approach to improving single-threaded application performance by using helper threads to perform instruction prefetch. We demonstrate the need for enabling store-to-load communication and selective instruction execution when directly pre-executing future regions of an application that suffer I-cache misses. Two novel hardware mechanisms, safe-store and YAT-bits, are introduced that help satisfy these requirements. This paper also proposes and evaluates .nite state machine recall, a technique for limiting pre-execution to branches that are hard to predict by leveraging a counted I-prefetch mechanism. On a research Itanium?SMT processor with next line and streaming I-prefetch mechanisms that incurs latencies representative of next generation processors, prescient instruction prefetch can improve performance by an average of 10.0% to 22% on a set of SPEC 2000 benchmarks that suffer significant I-cache misses. Prescient instruction prefetch is found to be competitive against even the most aggressive research hardware instruction prefetch technique: fetch directed instruction prefetch.
Citation:
Tor M. Aamodt, Paul Chow, Per Hammarlund, Hong Wang, John P. Shen, "Hardware Support for Prescient Instruction Prefetch," hpca, pp.84, 10th International Symposium on High Performance Computer Architecture (HPCA'04), 2004