This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Dynamically Tunable Memory Hierarchy
October 2003 (vol. 52 no. 10)
pp. 1243-1258

Abstract—The widespread use of repeaters in long wires creates the possibility of dynamically sizing regular on-chip structures. We present a tunable cache and translation lookaside buffer (TLB) hierarchy that leverages repeater insertion to dynamically trade off size for speed and power consumption on a per-application phase basis using a novel configuration management algorithm. In comparison to a conventional design that is fixed at a single design point targeted to the average application, the dynamically tunable cache and TLB hierarchy can be tailored to the needs of each application phase. The configuration algorithm dynamically detects phase changes and selects a configuration based on the application's ability to tolerate different hit and miss latencies in order to improve the memory energy-delay product. We evaluate the performance and energy consumption of our approach and project the effects of technology scaling trends on our design.

[1] V. Agarwal, M.S. Hrishikesh, S. Keckler, and D. Burger, Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures Proc. 27th Int'l Symp. Computer Architecture (ISCA), 2000.
[2] D.H. Albonesi, Dynamic IPC/Clock Rate Optimization Proc. 25th Int'l Symp. Computer Architecture (ISCA), pp. 282-292, June 1998.
[3] D.H. Albonesi, Selective Cache Ways: On-Demand Cache Resource Allocation Proc. 32nd Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 248-259, Nov. 1999.
[4] Semiconductor Industry Assoc., The National Technology Roadmap for Engineers technical report, 1999.
[5] H.B. Bakoglu and J.D. Meindl, Optimal Interconnect Circuits for VLSI IEEE Trans. Computers, vol. 34, no. 5, pp. 903-909, May 1985.
[6] R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, Dynamic Memory Hierarchy Performance Optimization Proc. Workshop Solving the Memory Wall Problem, June 2000.
[7] R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures Proc. MICRO-33, pp. 245-257, Dec. 2000.
[8] P. Bannon, Alpha 21364: A Scalable Single-Chip SMP Microprocessor Forum, Oct. 1998.
[9] W.J. Bowhill et al., Circuit Implementation of a 300-MHz 64-Bit Second-Generation CMOS Alpha CPU Digital Technical J., vol. 7, no. 1, pp. 100-118, 1995.
[10] D. Burger and T. Austin, The Simplescalar Toolset, Version 2.0 Technical Report TR-97-1342, Univ. of Wisconsin-Madison, June 1997.
[11] A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D.H. Albonesi, A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors Proc. 11th Great Lakes Symp. VLSI, Mar. 2001.
[12] F. Dahlgren and P. Stenstrom, On Reconfigurable On-Chip Data Caches Proc. MICRO-24, 1991.
[13] W.J. Dally and J.W. Poulton, Digital System Engineering. Cambridge, U.K.: Cambridge Univ. Press, 1998.
[14] S. Dropsho, A. Buyuktosunoglu, R. Balasubramonian, D.H. Albonesi, S. Dwarkadas, G. Semeraro, G. Magklis, and M.L. Scott, Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, Sept. 2002.
[15] K.I. Farkas and N.P. Jouppi, Complexity/Performance Tradeoffs with Non-Blocking Loads Proc. 21st Int'l Symp. Computer Architecture (ISCA), pp. 211-222, Apr. 1994.
[16] B. Fisk and I. Bahar, The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency Proc. IEEE Int'l Conf. Computer Design, Oct. 1999.
[17] J. Fleischman, private communication, Oct. 1999.
[18] L. Gwennap, PA-8500's 1.5M Cache Aids Performance Microprocessor Report, vol. 11, no. 15, 17 Nov. 1997.
[19] J.L. Hennessy, Back to the Future: Time to Return to Some Long Standing Problems in Computer Systems? Proc. Federated Computer Conf., May 1999.
[20] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[21] M.B. Kamble and K. Ghose,"Analytical Energy Dissipation Models for Low-Power Caches," Proc. Int'l Symp. Low Power Electronics and Design (ISPLED 97), ACM Press, 1997, pp. 143-148.
[22] R.E. Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, vol. 19, no. 2, pp. 24–36, Mar./Apr. 1999.
[23] A. Kumar, The HP PA-8000 RISC CPU Computer, vol. 17, no. 3, pp. 27-32, Mar. 1997.
[24] G. Lesartre and D. Hunt, PA-8500: The Continuing Evolution of the PA-8000 Family Proc. Compcon, 1997.
[25] G.W. McFarland, CMOS Technology Scaling and Its Impact on Cache Delay PhD thesis, Stanford Univ., June 1997.
[26] G.W. McFarland and M. Flynn, Limits of Scaling MOSFETS Technical Report CSL-TR-95-62, Stanford Univ., Nov. 1995.
[27] T.C. Mowry, M.S. Lam, and A. Gupta, Design and Evaluation of a Compiler Algorithm for Prefetching Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), pp. 62-73, Oct. 1992.
[28] P. Ranganathan, S. Adve, and N.P. Jouppi, Reconfigurable Caches and Their Application to Media Processing Proc. 27th Int'l Symp. Computer Architecture (ISCA), pp. 214-224, June 2000.
[29] A. Rogers, M. Carlisle, J. Reppy, and L. Hendren, Supporting Dynamic Data Structures on Distributed Memory Machines Trans. Programming Languages and Systems, Mar. 1995.
[30] T. Romer, W. Ohlrich, A. Karlin, and B. Bershad, Reducing TLB and Memory Overhead Using Online Superpage Promotion Proc. 22nd Int'l Symp. Computer Architecture (ISCA), 1995.
[31] G.S. Sohi, "Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Trans. Computers, Vol. 39, No. 3, 1990, pp. 349-359.
[32] S. Srinivasan, R. Ju, A. Lebeck, and C. Wilkerson, Locality vs. Criticality Proc. 28th Int'l Symp. Computer Architecture (ISCA), pp. 132-143, July 2001.
[33] S.T. Srinivasan and A.R. Lebeck, Load Latency Tolerance in Dynamically Scheduled Processors J. Instruction-Level Parallelism, vol. 1, Oct. 1999.
[34] A. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, and X. Ji, Adapting Cache Line Size to Application Behavior Proc. Int'l Conf. Supercomputing (ICS), 1999.
[35] S. Yang, M.D. Powell, B. Falsafi, K. Roy, and T.N. Vijaykumar, An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches Proc. Seventh Int'l Symp. High-Performance Computer Architecture (HPCA-7), Jan. 2001.
[36] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.

Index Terms:
High performance microprocessors, memory hierarchy, reconfigurable architectures, energy and performance of on-chip caches.
Citation:
Rajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, Sandhya Dwarkadas, "A Dynamically Tunable Memory Hierarchy," IEEE Transactions on Computers, vol. 52, no. 10, pp. 1243-1258, Oct. 2003, doi:10.1109/TC.2003.1234523
Usage of this product signifies your acceptance of the Terms of Use.