Parallel and Distributed Processing Symposium, International (2010)
Atlanta, GA, USA
Apr. 19, 2010 to Apr. 23, 2010
Sai Prashanth Muralidhara , Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
Mahmut Kandemir , Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
Padma Raghavan , Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
Efficient management of shared on-chip resources such as the shared level 2 (L2) cache has become an important problem with the emergence of chip multiprocessors (CMPs). Partitioning the shared cache in chip multiprocessors (CMPs) among concurrently executing applications can provide important benefits such as throughput improvement, fairness guarantees, and quality of service (QoS) enhancements. In this paper, we pose an interesting related question, which is, if partitioning the shared cache space among concurrently executing threads of the same application can enhance the application performance. We address this problem by identifying and speeding up the slowest thread, also termed as the critical path thread, during each execution interval since the overall performance of a multithreaded application is determined by the critical path thread. To do so, we propose a dynamic, runtime system based, cache partitioning scheme that partitions the shared cache space dynamically among the individual threads of a given application. In a nutshell, we wish to take some cache space away from the faster threads and give it to the critical path thread at each execution interval. We show that speeding up the critical path thread this way, results in overall performance enhancement of the application execution in the long term. Our experimental evaluation indicates that, the proposed dynamic cache partitioning scheme yields benefits up to 15% over a shared cache with no partitions, up to 23% over a statically partitioned cache (private cache) and up to 20% over a throughput-oriented scheme.
M. Kandemir, P. Raghavan and S. P. Muralidhara, "Intra-application cache partitioning," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, 2010, pp. 1-12.