This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Predictable Performance in SMT Processors: Synergy between the OS and SMTs
July 2006 (vol. 55 no. 7)
pp. 785-799
Current Operating Systems (OS) perceive the different contexts of Simultaneous Multithreaded (SMT) processors as multiple independent processing units, although, in reality, threads executed in these units compete for the same hardware resources. Furthermore, hardware resources are assigned to threads implicitly as determined by the SMT instruction fetch (Ifetch) policy, without the control of the OS. Both factors cause a lack of control over how individual threads are executed, which can frustrate the work of the job scheduler. This presents a problem for general purpose systems, where the OS job scheduler cannot enforce priorities, and also for embedded systems, where it would be difficult to guarantee worst-case execution times. In this paper, we propose a novel strategy that enables a two-way interaction between the OS and the SMT processor and allows the OS to run jobs at a certain percentage of their maximum speed, regardless of the workload in which these jobs are executed. In contrast to previous approaches, our approach enables the OS to run time-critical jobs without dedicating all internal resources to them so that non-time-critical jobs can make significant progress as well and without significantly compromising overall throughput. In fact, our mechanism, in addition to fulfilling OS requirements, achieves 90 precent of the throughput of one of the best currently known fetch policies for SMTs.

[1] http://www.cpuid.org/k8/index.php, 2006.
[2] IA-32 Intel Architecture Software Developer's Manual. Volume 3: System Programming Guide, 2006.
[3] A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller, “Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems,” Proc. 30th Int'l Symp. Computer Architecture (ISCA), pp. 350-361, June 2003.
[4] D.C. Bossen, J.M. Tendler, and K. Reick, “Power4 System Design for High Reliability,” IEEE Micro, vol. 22, no. 2, pp. 16-24, 2002.
[5] F.J. Cazorla, E. Fernandez, A. Ramirez, and M. Valero, “Improving Memory Latency Aware Fetch Policies for SMT Processors,” Proc. Fifth Int'l Symp. High Performance Computing (ISHPC), pp. 70-85, Oct. 2003.
[6] F.J. Cazorla, E. Fernandez, A. Ramirez, and M. Valero, “DCache Wam: An I-Fetch Policy to Incrase SMT Efficiency,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS 2004), Apr. 2004.
[7] F.J. Cazorla, E. Fernandez, A. Ramirez, and M. Valero, “Dynamically Controlled Resource Allocation in SMT Processors,” Proc. 37th MICRO, pp. 171-182, 2004.
[8] D. Chiou, P. Jain, S. Devadas, and L. Rudolph, “Dynamic Cache Partitioning via Columnization,” Proc. Design Automation Conf., June 2000.
[9] A. El-Moursy and D.H. Albonesi, “Front-End Policies For Improved Issue Efficiency in SMT Processors,” Proc. Ninth Int'l Symp. High Performance Computer Architecture (HPCA), pp. 31-42, Feb. 2003.
[10] S. Hily and A. Seznec, “Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading,” Technical Report 1086, IRISA, 1997.
[11] R. Jain, C.J. Hughes, and S.V. Adve, “Soft Real-Time Scheduling on Simultaneous Multithreaded Processors,” Proc. Fifth Int'l Symp. Real-Time Systems Symp., Dec. 2002.
[12] R. Kalla, B. Sinharoy, and J. Tendler, “SMT Implementation in POWER 5,” Proc. Hot Chips, Aug. 2003.
[13] P.M.W. Knijnenburg, A. Ramirez, J. Larriba, and M. Valero, “Branch Classification for SMT Fetch Gating,” Proc. Sixth Workshop Multithreaded Execution, Architecture, and Compilation (MTEAC), pp. 49-56, 2002.
[14] K. Krewell, “Fujitsu Makes SPARC See Double,” Microprocessor Report, Nov. 2003.
[15] M. Levy, “Multithreaded Technologies Disclosed at MPF,” Microprocessor Report, Nov. 2003.
[16] K. Luo, J. Gummaraju, and M. Franklin, “Balancing Throughput and Fairness in SMT Processors,” Proc. Int'l Symp. Performance Analysis of Systems and Software (ISPASS), pp. 164-171, Nov. 2001.
[17] D.T. Marr, F. Binns, D.L. Hill, G. Hinton, D.A. Koufaty, J.A. Miller, and M. Upton, “Hyper-Threading Technology Architecture and Microarchitecture,” Intel Technology J., Feb. 2002.
[18] M.J. Serrano, R. Wood, and M. Nemirovsky, “A Study on Multistreamed Superscalar Processors,” Technical Report #93-05, Univ. of California Santa Barbara, 1993.
[19] T. Sherwood, E. Perelman, and B. Calder, “Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications,” Proc. 10th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), Sept. 2001.
[20] A. Snavely, D.M. Tullsen, and G. Voelker, “Symbiotic Job Scheduling with Priorities for a Simultaneous Multithreaded Processor,” Proc. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-9), pp. 234-244, Nov. 2000.
[21] D. Tullsen and J. Brown, “Handling Long-Latency Loads in a Simultaneous Multithreaded Processor,” Proc. 34th Int'l Symp. Microarchitecture, pp. 318-327, Dec. 2001.
[22] D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,” Proc. 23rd Int'l Symp. Computer Architecture (ISCA), pp. 191-202, Apr. 1996.
[23] D.M. Tullsen, S. Eggers, and H.M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” Proc. 22nd Int'l Symp. Computer Architecture (ISCA), pp. 392-403, 1995.
[24] R.E. Wunderlich, T.F. Wenisch, B. Falsafi, and J.C. Hoe, “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” Proc. 30th Int'l Symp. Computer Architecture (ISCA), pp. 84-97, June 2003.
[25] Z. Xu, S. Sohoni, R. Min, and Y. Hu, “An Analysis of Cache Performance of Multimedia Applications,” IEEE Trans. Computers, vol. 53, no. 1, pp. 20-38, Jan. 2004.

Index Terms:
Multithreaded processors, simultaneous multithreading, ILP, thread-level parallelism, performance predictability, real time, operating systems.
Citation:
Francisco J. Cazorla, Peter M.W. Knijnenburg, Rizos Sakellariou, Enrique Fern?ndez, Alex Ramirez, Mateo Valero, "Predictable Performance in SMT Processors: Synergy between the OS and SMTs," IEEE Transactions on Computers, vol. 55, no. 7, pp. 785-799, July 2006, doi:10.1109/TC.2006.108
Usage of this product signifies your acceptance of the Terms of Use.