2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Toronto, ON, Canada
Oct. 25, 2008 to Oct. 29, 2008
DOI Bookmark: http://doi.ieeecomputersociety.org/
Aamer Jaleel , Intel Corporation, VSSAD, Hudson, MA, USA
William Hasenplaugh , Intel Corporation, VSSAD, Hudson, MA, USA
Moinuddin Qureshi , IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Julien Sebot , Intel Israel Design Center, Haifa, Israel
Simon Steely , Intel Corporation, VSSAD, Hudson, MA, USA
Joel Emer , Intel Corporation, VSSAD, Hudson, MA, USA
Chip Multiprocessors (CMPs) allow different applications to concurrently execute on a single chip. When applications with differing demands for memory compete for a shared cache, the conventional LRU replacement policy can significantly degrade cache performance when the aggregate working set size is greater than the shared cache. In such cases, shared cache performance can be significantly improved by preserving the entire working set of applications that can co-exist in the cache and preserving some portion of the working set of the remaining applications. This paper investigates the use of adaptive insertion policies to manage shared caches. We show that directly extending the recently proposed dynamic insertion policy (DIP) is inadequate for shared caches since DIP is unaware of the characteristics of individual applications. We propose Thread-Aware Dynamic Insertion Policy (TADIP) that can take into account the memory requirements of each of the concurrently executing applications. Our evaluation with multi-programmed workloads for 2-core, 4-core, 8-core, and 16-core CMPs show that a TADIP-managed shared cache improves overall throughput by as much as 94%, 64%, 26%, and 16% respectively (on average 14%, 18%, 15%, and 17%) over the baseline LRU policy. The performance benefit of TADIP is 2.6x compared to DIP and 1.3x compared to the recently proposed Utility-based Cache Partitioning (UCP) scheme. We also show that a TADIP-managed shared cache provides performance benefits similar to doubling the size of an LRU-managed cache. Furthermore, TADIP requires a total storage overhead of less than two bytes per core, does not require changes to the existing cache structure, and performs similar to LRU for LRU friendly workloads.
Electronics packaging, Program processors, Hardware, Fitting, Multicore processing, Memory management, Throughput
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely and J. Emer, "Adaptive insertion policies for managing shared caches," 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), Toronto, ON, Canada, 2008, pp. 208-219.