2009 18th International Conference on Parallel Architectures and Compilation Techniques (2009)
Raleigh, North Carolina, USA
Sept. 12, 2009 to Sept. 16, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2009.13
Shared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement decisions. But under high cache interference, LRU performs poorly because some memory-intensive threads, or aggressor threads, allocate cache that could be more gainfully used by other (less memory-intensive) threads. Techniques like cache partitioning can address this problem by performing explicit allocation to prevent aggressor threads from taking over the cache. Whether implicit or explicit, the key factor controlling cache allocation is victim thread selection. The choice of victim thread relative to the cache-missing thread determines each cache miss’s impact on cache allocation: if the two are the same, allocation doesn’t change, but if the two are different, then one cache block shifts from the victim thread to the cache-missing thread. In this paper, we study an omniscient policy, called ORACLE-VT, that uses off-line information to always select the best victim thread, and hence, maintain the best per-thread cache allocation at all times.We analyze ORACLE-VT, and find it victimizes aggressor threads about 80% of the time. To see if we can approximate ORACLE-VT, we develop AGGRESSORVT, a policy that probabilistically victimizes aggressor threads with strong bias. Our results show AGGRESSOR-VT comes close to ORACLE-VT’s miss rate, achieving three-quarters of its gain over LRU and roughly half of its gain over an ideal cache partitioning technique. To make AGGRESSOR-VT feasible for real systems, we develop a sampling algorithm that “learns” the identity of aggressor threads via runtime performance feedback. We also modify AGGRESSOR-VT to permit adjusting the probability for victimizing aggressor threads, and use our sampling algorithm to learn the per-thread victimization probabilities that optimize system performance (e.g., weighted IPC). We call this policy AGGRESSORpr-VT. Our results show AGGRESSORpr-VT outperforms LRU, UCP , and an ideal cache way partitioning technique by 4.86%, 3.15%, and 1.09%, respectively.
shared cache management, cache partitioning, memory interleaving, aggressor thread
W. Liu and D. Yeung, "Using Aggressor Thread Information to Improve Shared Cache Management for CMPs," 2009 18th International Conference on Parallel Architectures and Compilation Techniques(PACT), Raleigh, North Carolina, USA, 2009, pp. 372-383.