The Community for Technology Leaders
RSS Icon
Issue No.08 - Aug. (2013 vol.24)
pp: 1622-1632
Huang ZhiBin , Beihang University, Beijing
Zhu Mingfa , Beihang University, Beijing
Xiao Limin , Beihang University, Beijing
Partition enforcement policy is essential in the cache partition, and its main function is to protect the lines and retain the cache quota of each core. This paper focuses online protection based on its generation time rather than the CPU core ID that it belongs to or the position of the replacement stack, where it is located. The basic idea is that when a line is live, it must be protected and retained in the cache; when the line is "dead," it needs to be evicted as early as possible. Therefore, the live-time protected counter (LvtP, four bits) is augmented to trace the lines' live time. Moreover, dead blocks are predicted according to the access event sequence. This paper presents a pseudopartition approach--LvtPPP and proposes a two-cascade victim selection mechanism to alleviate dead blocks based on the LRU replacement policy and the LvtP counter. LvtPPP also supports flexible handling of allocation deviation by introducing a parameter $(\lambda)$ to adjust the generation time of the line. There is significant improvement of the performance and fairness in LvtPPP over PIPP and UCP according to the evaluation results based on Simics.
Radiation detectors, Resource management, Multicore processing, Partitioning algorithms, History, Pollution, Monitoring, dead block, Radiation detectors, Resource management, Multicore processing, Partitioning algorithms, History, Pollution, Monitoring, shared last-level-cache (LLC), Cache memories, cache partition
Huang ZhiBin, Zhu Mingfa, Xiao Limin, "LvtPPP: Live-Time Protected Pseudopartitioning of Multicore Shared Caches", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 8, pp. 1622-1632, Aug. 2013, doi:10.1109/TPDS.2012.230
[1] R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger, "Evaluation Techniques for Storage Hierarchies," IBM Systems J., vol. 9, no. 2, pp. 78-117, 1970.
[2] S.M. Khan, D.A. Jimenez, D. Burger, and B. Falsafi, "Using Dead Blocks as a Virtual Victim Cache," Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '10), pp. 489-500, 2010.
[3] G.E. Suh, L. Rudolph, and S. Devadas, "Dynamic Partitioning of Shared Cache Memory," J. Supercomputing, vol. 28, no. 1, pp. 7-26, 2004.
[4] S. Kim, D. Chandra, and Y. Solihin, "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture," Proc. 13th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 111-122, 2004.
[5] M. Qureshi and Y. Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," Proc. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 423-432, 2006.
[6] D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture," Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 340-351, 2005.
[7] N. Rafique, W.-T. Lim, and M. Thottethodi, "Architectural Support for Operating System-Driven CMP Cache Management," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 2-12, 2006.
[8] Y. Xie and G.H. Loh, "PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches," Proc. 36th Int'l Symp. Computer Architecture, pp. 174-183, June 2009.
[9] C. Yu and P. Petrov, "Off-Chip Memory Bandwidth Minimization through Cache Partitioning for Multi-Core Platforms," Proc. ACM/IEEE 47th Design Automation Conf. (DAC), 2010.
[10] B.M. Rogers, A. Krishna, G.B. Bell, K. Vu, X. Jiang, and Y. Solihin, "Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling," SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 371-382, 2009.
[11] D.A. Wood, M.D. Hill, and R.E. Kessler, "A Model for Estimating Trace-Sample Miss Ratios," Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, 1991.
[12] M. Kharbutli and Y. Solihin, "Counter-Based Cache Replacement and Bypassing Algorithms," IEEE Trans. Computers, vol. 57, no. 4, pp. 433-447, Apr. 2008.
[13] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulator Platform," Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[14] A. Snavely and D.M. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor," Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 2000.
[15] K. Luo, J. Gummaraju, and M. Franklin, "Balancing Throughput and Fairness in SMT Processors," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS '01), 2001.
[16] H. Liu, M. Ferdman, J. Huh, and D. Burger, "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency," Proc. 41st Int'l Symp. Microarchitecture, pp. 222-233, Nov. 2008.
[17] A.-C. Lai, C. Fide, and B. Falsafi, "Dead-Block Prediction & Dead-Block Correlating Prefetchers," Proc. 28th Int'l Symp. Microarchitecture, pp. 144-154, June 2001.
[18] A. Sandberg, D. Eklov, and E. Hagersten, "Reducing Cache Pollution through Detection and Elimination of Non-Temporal Memory Accesses," Proc. ACM/IEEE Conf. Supercomputing (SC), Nov. 2010.
[19] M. Chaudhuri, "PageNUCA: Selected Policies for Page-Grain Locality Management in Large Shared Chip-Multiprocessor Caches," Proc. IEEE 15th Int'l Symp. High-Performance Computer Architecture (HPCA), 2009.
[20] D. Kaseridis, J. Stuecheli, and L.K. John, "Bank-Aware Dynamic Cache Partitioning for Multicore Architectures," Proc. 38th Int'l Conf. Parallel Processing (ICPP), 2009.
[21] M. Qureshi, A. Jaleel, Y. Patt, S. Steely, and J. Emer, "Adaptive Insertion Policies for High Performance Caching," Proc. 34th Ann. Int'l Symp. Computer architecture (ISCA 34), 2007.
[22] G. Keramidas, P. Petoumenos, and S. Kaxiras, "Cache Replacement Based on Reuse-Distance Prediction," Proc. 25th Int'l Conf. Computer Design (ICCD), 2007.
[23] A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. SteelyJr., and J. Emer, "Adaptive Insertion Policies for Managing Shared Caches," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT 17), 2007.
[24] Z. Hu, M. Martonosi, and S. Kaxiras, "Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA 29), pp. 209-220, May 2002.
[25] J. Jalminger and P.P. Stenström, "A Novel Approach to Cache Block Reuse Prediction," Proc. Int'l Conf. Parallel Processing, 2003.
[26] K. Flautner, N.S. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy Caches: Simple Techniques for Reducing Leakage Power," Proc. 29th Ann. Int'l Symp. Computer Architecture, 2002.
[27] S. Kaxiras, Z. Hu, and M. Martonosi, "Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA 28), 2001.
[28] S. Somogyi et al., "Memory Coherence Activity Prediction in Commercial Workloads," Proc. Third Workshop Memory Performance Issues, 2004.
[29] H.S. Stone, J. Tuerk, and J.L. Wolf, "Optimal Paritioning of Cache Memory," IEEE Trans. Computers, vol. 41, no. 9, pp. 1054-1068, Sept. 1992.
[30] S. Khan, Y. Tian, and D.A. Jimenez, "Sampling Dead Block Prediction for Last-Level Caches," Proc. IEEE/ACM 43rd Ann. Int'l Symp. Microarchitecture (MICRO 43), Dec. 2010.
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool