The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - Feb. (2013 vol.62)
pp: 390-403
Wei Zang , University of Florida, Gainesville
Ann Gordon-Ross , University of Florida, Gainesville
ABSTRACT
The cache hierarchy's large contribution to total microprocessor system power makes caches a good optimization candidate. To facilitate a fast design-time cache optimization process, we propose a single-pass trace-driven cache simulation methodology—T-SPaCS—for a two-level exclusive cache hierarchy. Direct adaptation of conventional trace-driven cache simulation to two-level caches requires significant storage and simulation time as numerous stacks record cache access patterns for each level one and level two cache combination and each stack is repeatedly processed. T-SPaCS significantly reduces storage space and simulation time using a set of stacks that only record the complete cache access pattern. Thereby, T-SPaCS simulates all cache configurations for both the level one and level two caches simultaneously in a single pass. Experimental results show that T-SPaCS is 21.02X faster on average than sequential simulation for instruction caches and 33.34X faster for data caches. A simplified, but minimally lossy version of T-SPaCS (simplified-T-SPaCS) increases the average simulation speedup to 30.15X for instruction caches and 41.31X for data caches. We leverage T-SPaCS and simplified-T-SPaCS for determining the lowest energy cache configuration to quantify the effects of lossiness and observe that T-SPaCS and simplified-T-SPaCS still find the lowest energy cache configuration as compared to exact simulation.
INDEX TERMS
Tuning, Algorithm design and analysis, Computational modeling, Analytical models, Complexity theory, Data models, Program processors, simulation, Cache memories, low-power design, real-time systems and embedded systems
CITATION
Wei Zang, Ann Gordon-Ross, "T-SPaCS—A Two-Level Single-Pass Cache Simulation Methodology", IEEE Transactions on Computers, vol.62, no. 2, pp. 390-403, Feb. 2013, doi:10.1109/TC.2011.194
REFERENCES
[1] Altera, “Nios Embedded Processor System Development,” http://www.altera.com/corporate/news_room/ releases/ productsnrnios_delivers_goods.html , 2012.
[2] Arc Int'l, http:/www.arccores.com, 2012.
[3] ARM 1156 Processor, http://www.arm.com/products/ processors/ classicarm11, 2012.
[4] R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, “Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures,” Proc. IEEE/ACM 33rd Ann. Int'l Symp. Microarchitecture, pp. 245-257, Dec. 2000.
[5] S. Banerjee, G. Surendra, and S.K. Nandy, “Program Phase Directed Dynamic Cache Way Reconfiguration for Power Efficiency,” Proc. Asia and South Pacific Design Automation Conf., pp. 884-889, Jan. 2007.
[6] M. Brehob and R.J. Enbody, “An Analytical Model of Locality and Caching,” technical report, Michigan State Univ., 1996.
[7] D. Burger, T. Austin, and S. Bennet, “Evaluating Future Microprocessors: The Simplescalar Toolset,” Technical Report CS-TR-1308, Computer Science Department, Univ. of Wisconsin-Madison, July 2000.
[8] CACTI, http://www.hpl.hp.com/researchcacti/, 2012.
[9] T.M. Conte, M.A. Hirsch, and W.W. Hwu, “Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation,” IEEE Trans. Computers, vol. 47, no. 6, pp. 714-720, June 1998.
[10] Dinero IV Trace-Driven Uniprocessor Cache Simulator, http://pages.cs.wisc.edu/~markhillDineroIV /, 2012.
[11] EEMBC, the Embedded Microprocessor Benchmark Consortium, www.eembc.org, 2012.
[12] A. Ghosh and T. Givargis, “Cache Optimization for Embedded Processor Cores: An Analytical Approach,” ACM Trans. Design Automation of Electronic Systems, vol. 9, no. 4, pp. 419-440, Oct. 2004.
[13] A. Gordon-Ross, F. Vahid, and N. Dutt, “Automatic Tuning of Two-Level Caches to Embedded Applications,” Proc. IEEE/ACM Design Automation and Test in Europe Conf. and Exhibition, pp. 208-213, Feb. 2004.
[14] A. Gordon-Ross and F. Vahid, “A Self-Tuning Configurable Cache,” Proc. IEEE Design Automation Conf., pp. 234-237, July 2007.
[15] A. Gordon-Ross, P. Viana, F. Vahid, W. Najjar, and E. Barros, “A One-Shot Configurable-Cache Tuner for Improved Energy and Performance,” Proc. IEEE/ACM Design, Automation and Test in Europe Conf. Exhibition, pp. 1-6, Apr. 2007.
[16] A. Gordon-Ross, J. Lau, and B. Calder, “Phase-Based Cache Reconfiguration for Highly-Configurable Two-Level Cache Hierarchy,” Proc. ACM 18th Great Lakes Symp. VLSI, pp. 323-337, May 2008.
[17] A. Gordon-Ross, F. Vahid, and N. Dutt, “Fast Configurable-Cache Tuning with a Unified Second-Level Cache,” IEEE Tran. VLSI Systems, vol. 17, no. 1, pp. 80-91, Jan. 2009.
[18] P. Heidelberger and H.S. Stone, “Parallel Trace-driven Cache Simulation by Time Partitioning,” Proc. Winter Simulation Conf., pp. 734-737, Dec. 1990.
[19] M.D. Hill and A.J. Smith, “Evaluating Associativity in CPU Caches,” IEEE Trans. Computers, vol. 38, no. 12, pp. 1612-1630, Dec. 1989.
[20] A. Janapsatya, A. Lgnjatović, and S. Parameswaran, “Finding Optimal L1 Cache Cinfiguration for Embedded Systems,” Proc. Asia and South Pacific Design Automation Conf., Jan. 2006.
[21] A. Janapsatya, A. Lgnjatović, S. Parameswaran, and J. Henkel, “Instruction Trace Compression for Rapid Instruction Cache Simulation,” Proc. Conf. Design, Automation and Test in Europe, pp. 1-6, Apr. 2007.
[22] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems,” Proc. 30th Ann. Int'l Symp. Microarchitecture, pp. 330-335, Dec. 1997.
[23] A. Malik, W. Moyer, and D. Cermak, “A Low Power Unified Cache Architecture Providing Power and Performance Flexibility,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 241-243, 2000.
[24] R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger, “Evaluation Techniques for Storage Hierarchies,” IBM Systems J., vol. 9, no. 2, pp. 78-117, 1970.
[25] MIPS32 4KE Family, http://www.mips.com/products/cores32-64-bit-cores /, 2012.
[26] S. Segars, “Low Power Design Techniques for Micropocessors,” Proc. Int'l Solid State Circuit Conf., Feb. 2001.
[27] T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder, “Discovering and Exploiting Program Phases,” Proc. IEEE Micro: Top Picks from Computer Architecture Conf., pp. 84-93, Dec. 2003.
[28] SimPoint, http://cseweb.ucsd.edu/~caldersimpoint/, 2012.
[29] R. Sugumar and S. Abraham, “Efficient Simulation of Multiple Cache Configurations Using Binomial Trees,” technical report, 1991.
[30] R.A. Sugumar, “Multi-Reconfiguration Simulation Algorithms for the Evaluation of Computer Architecture Designs,” PhD thesis, Univ. of Michigan, Ann Arbor, Michigan, 1993.
[31] Tensilica, Xtensa Processor Generator, http:/www.tensilica. com/, 2012.
[32] J.G. Thompson and A.J. Smith, “Efficient (stack) Algorithms for Analysis of Write-Back and Sector Memories,” ACM Trans. Computer Systems, vol. 7, no. 1, pp. 78-117, 1989.
[33] P. Viana, A. Gordon-Ross, E. Baros, and F. Vahid, “A Table-Based Method for Single-Pass Cache Optimization,” Proc. ACM Great Lakes Symp. VLSI (GLSVLSI), May 2008.
[34] H. Wan, X. Gao, X. Long, and Z. Wang, “GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache,” Proc. Advanced Parallel Processing Technologies, pp. 177-190, 2009.
[35] Z. Ying, B.T. Davis, and M. Jordan, “Performance Evaluation of Exclusive Cache Hierarchies,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, pp. 89-96, 2004.
[36] C. Zhang, F. Vahid, and R. Lysecky, “A Self-Tuning Cache Architecture for Embedded Systems,” ACM Trans. Embedded Computing Systems, vol. 3, no. 2, pp. 407-425, May 2004.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool