The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2011 vol.22)
pp: 1150-1163
Huaping Wang , University of Massachusetts Amherst, Amherst
Israel Koren , University of Massachusetts Amherst, Amherst
C. Mani Krishna , University of Massachusetts Amherst, Amherst
ABSTRACT
Simultaneous multithreading (SMT) increases processor throughput by allowing parallel execution of several threads. However, fully sharing processor resources may cause resource monopolization by a single thread or other misallocations, resulting in overall performance degradation. Static resource partitioning techniques have been suggested, but are not as effective as dynamic ones since program behavior does change over the course of its execution. In this paper, we propose an Adaptive Resource Partitioning Algorithm (ARPA) that dynamically assigns resources to threads according to changes in thread behavior. ARPA analyzes the resource usage efficiency of each thread in a given time period and assigns more resources to threads which can use them more efficiently. Its purpose is to improve the efficiency of resource utilization, thereby improving overall instruction throughput. Our simulation results on a set of 42 multiprogramming workloads show that ARPA outperforms the traditional fetch policy ICOUNT by 55.8 percent with regard to overall instruction throughput and achieves a 33.8 percent improvement over Static Partitioning. It also outperforms the current best dynamic resource allocation technique, Hill-climbing, by 5.7 percent. Considering fairness accorded to each thread, ARPA attains 43.6, 18.5, and 9.2 percent improvements over ICOUNT, Static Partitioning, and Hill-climbing, respectively, using a common fairness metric. We also explore the energy efficiency of dynamically controlling the number of powered-on reorder buffer entries for ARPA. Compared with ARPA, our energy-aware resource partitioning algorithm achieves 10.6 percent energy savings, while the performance loss is negligible.
INDEX TERMS
Simultaneous multithreading, resource partitioning, power-performance efficiency.
CITATION
Huaping Wang, Israel Koren, C. Mani Krishna, "Utilization-Based Resource Partitioning for Power-Performance Efficiency in SMT Processors", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 7, pp. 1150-1163, July 2011, doi:10.1109/TPDS.2010.199
REFERENCES
[1] J. Abella and A. González, "Power-Aware Adaptive Issue Queue and Register File," Proc. 10th Int'l Conf. High Performance, pp. 34-43, Dec. 2003.
[2] D.H. Albonesi et al., "Dynamically Tuning Processor Resources with Adaptive Processing," Computer, vol. 36, no. 12, pp. 49-58, Dec. 2003.
[3] D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A Framework for Architectural-Level Power Analysis and Optimizations," Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 83-94, June 2000.
[4] D.C. Burger and T.M. Austin, "The SimpleScalar Tool Set, Version 2.0," Technical Report CS-TR-1997-1342, Univ. of Wisconsin, June 1997.
[5] A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P.W. Cook, and D.H. Albonesi, "An Adaptive Issue Queue for Reduced Power at High Performance," Proc. First Int'l Workshop Power-Aware Computer Systems, pp. 25-39, Jan. 2001.
[6] F.J. Cazorla, E. Fernández, A. Ramírez, and M. Valero, "Improving Memory Latency Aware Fetch Policies for SMT Processors," Proc. Fifth Int'l Symp. High Performance Computing, pp. 70-85, Oct. 2003.
[7] F.J. Cazorla, A. Ramírez, M. Valero, and E. Fernández, "Dynamically Controlled Resource Allocation in SMT Processors," Proc. 37th Int'l Symp. Microarchitecture, pp. 171-182, Dec. 2004.
[8] S. Choi and D. Yeung, "Learning-Based SMT Processor Resource Distribution via Hill-Climbing," Proc. 33rd Ann. Int'l Symp. Computer Architecture, pp. 239-251, June 2006.
[9] S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, R.L. Stamm, and D.M. Tullsen, "Simultaneous Multithreading: A Platform for Next-Generation Processors," IEEE Micro, vol. 17, no. 5, pp. 12-19, Sept. 1997.
[10] A. El-Moursy and D.H. Albonesi, "Front-End Policies for Improved Issue Efficiency in SMT Processors," Proc. Ninth Int'l Symp. High Performance Computer Architecture, pp. 31-40, Feb. 2003.
[11] D. Folegnani and A. González, "Energy-Effective Issue Logic," Proc. 28th Int'l Symp. Computer Architecture, pp. 230-239, June 2001.
[12] H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa, "An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 136-145, May 1992.
[13] S. Lee and J. Gaudiot, "Throttling-Based Resource Management in High Performance Multithreaded Architectures," IEEE Trans. Computers, vol. 55, no. 9, pp. 1142-1152, Sept. 2006.
[14] K. Luo, J. Gummaraju, and M. Franklin, "Balancing Throughout and Fairness in SMT Processors," Proc. Int'l Symp. Performance Analysis of Systems and Software, pp. 164-171, Nov. 2001.
[15] D.T. Marr et al., "Hyper-Threading Technology Architecture and Microarchitecture," Intel Technology J., vol. 6, no. 1, pp. 4-15, Feb. 2002.
[16] D. Ponomarev, G. Kucuk, and K. Ghose, "Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources," Proc. 34th Int'l Symp. Microarchitecture, pp. 90-101, Dec. 2001.
[17] D. Ponomarev, G. Kucuk, and K. Ghose, "Energy-Efficient Design of the Reorder Buffer," Proc. 12th Int'l Workshop Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation, pp. 289-299, 2002.
[18] S.E. Raasch and S.K. Reinhardt, "The Impact of Resource Partitioning on SMT Processors," Proc. 12th Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 15-26, Sept. 2003.
[19] S. Sair and M. Charney, "Memory Behavior of the SPEC2000 Benchmark Suite," technical report, IBM T.J. Watson Research Center, 2000.
[20] J.J. Sharkey, D. Balkan, and D. Ponomarev, "Adaptive Reorder Buffers for SMT Processors," Proc. 15th Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 244-253, Sept. 2006.
[21] D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 392-403, June 1995.
[22] D.M. Tullsen and J.A. Brown, "Handling Long-Latency Loads in a Simultaneous Multithreading Processor," Proc. 34th Int'l Symp. Microarchitecture, pp. 318-327, Dec. 2001.
[23] D.M. Tullsen et al., "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous MultiThreading Processor," Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 191-202, May 1996.
[24] H. Wang, Y. Guo, I. Koren, and C.M. Krishna, "Compiler-Based Adaptive Fetch Throttling for Energy Efficiency," Proc. Int'l Symp. Performance Analysis of Systems and Software, pp. 112-119, Mar. 2006.
[25] H. Wang, I. Koren, and C.M. Krishna, "An Adaptive Resource Partitioning Algorithm for SMT Processors," Proc. 17th Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 230-239, Oct. 2008.
[26] W. Yamamoto and M. Nemirovsky, "Increasing Superscalar Performance through Multistreaming," Proc. First Int'l Symp. High Performance Computer Architecture, pp. 49-58, June 1995.
[27] Z. Zhu and X. Zhang, "Look-Ahead Architecture Adaptation to Reduce Processor Power Consumption," IEEE Micro, vol. 25, no. 4, pp. 10-19, July-Aug. 2005.
[28] K. Wilcox and S. Manne, "Alpha Processors: A History of Power Issues and a Look to the Future," Proc. Workshop Cool-Chips Tutorial, Nov. 1999.
[29] S. Eyerman and L. Eeckhout, "System-Level Performance Metrics for Multiprogram Workloads," IEEE Micro, vol. 28, no. 3, pp. 42-53, May-June 2008.
[30] S. Eyerman and L. Eeckhout, "Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors," ACM Trans. Architecture and Code Optimization, vol. 6, no. 1, pp. 1-33, 2009.
8 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool