The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2004)
Santa Fe, New Mexico
Apr. 26, 2004 to Apr. 30, 2004
ISBN: 0-7695-2132-0
pp: 74a
Francisco J. Cazorla , Universitat Politècnica de Catalunya
Alex Ramirez , Universitat Politècnica de Catalunya
Mateo Valero , Universitat Politècnica de Catalunya
Enrique Fernández , Universidad de Las Palmas de Gran Canaria
<p>Simultaneous Multithreading (SMT) processors increase performance by executing instructions from multiple threads simultaneously. These threads share the processor's resources, but also compete for them. In this environment, a thread missing in the L2 cache may allocate a large number of resources for a long time, causing other threads to run much slower than they could.</p> <p>To prevent this problem we should know in advance if a thread is going to miss in the L2 cache. L1 misses are a clear indicator of a possible L2 miss. However, to stall a thread on every L1 miss is too severe, because not all L1 misses lead to an L2 miss, and this would cause an unnecessary stall and resource under-use. Also, to wait until an L2 miss is declared and squash the thread to free up the allocated resources is too expensive in terms of complexity and re-executed instructions.</p> <p>In this paper we propose a novel fetch policy, which we call DWarn. DWarn uses L1 misses as indicators of L2 misses, giving higher priority to threads with no outstanding L1 misses. DWarn acts on L1 misses, before L2 misses happen in a controlled manner to reduce resource under-use and to avoid harming a thread when L1 misses do not lead to L2 misses. Our results show that DWarn outperforms previously proposed policies, in both throughput and fairness, while requiring fewer resources and avoiding instruction re-execution.</p>

M. Valero, A. Ramirez, F. J. Cazorla and E. Fernández, "DCache Warn: An I-Fetch Policy to Increase SMT Efficiency," Parallel and Distributed Processing Symposium, International(IPDPS), Santa Fe, New Mexico, 2004, pp. 74a.
87 ms
(Ver 3.3 (11022016))