2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (2001)
Nuevo Leone, Mexico
Jan. 20, 2001 to Jan. 24, 2001
Amir Roth , University of Wisconsin-Madison
Gurindar S. Sohi , University of Wisconsin-Madison
Abstract: Mispredicted branches and loads that miss in the cache cause the majority of retirement stalls experienced by sequential processors; we call these critical instructions.Despite their importance, a sequential processor has difficulty prioritizing critical computations (computations of critical instructions), because it must fetch all computations sequentially,regardless of their contribution to performance. Speculative data-driven multithreading (DDMT) is a general-purpose mechanism for overcoming this limitation.In DDMT,critical computations are annotated so that they can execute standalone. When the processor predicts an upcoming instance of a critical instruction, it microarchitecturally forks a copy of its computation as a new kind of speculative thread: a data-driven thread (DDT). The DDT executes in parallel with the main program thread, but typically generates the critical result much faster since it fetches and executes only the critical computation and not the whole program. A DDT "pre-executes" critical computation and effectively "consumes" its latency on behalf of the main thread. A DDMT component called integration incorporates results computed in DDTs directly into the main thread, sparing it from having to repeat the work.We simulate an implementation of DDMT on top of a simultaneous multithreading (SMT)processor and use program profiles to create DDTs and annotate them into the executable. Our experiments show that DDMT pre-execution of critical loads and branches can improve performance significantly.
Amir Roth, Gurindar S. Sohi, "Speculative Data-Driven Multithreading", 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), vol. 00, no. , pp. 0037, 2001, doi:10.1109/HPCA.2001.903250