This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
SMT Layout Overhead and Scalability
February 2002 (vol. 13 no. 2)
pp. 142-155

Abstract—Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.

[1] T. Austin, “The SimpleScalar Architectural Research Tool Set, Version 2.,” Univ. of Wisconsin-Madison Technical Report #1342, June 1997.
[2] A. Ahi, Y. Chen, R. Conrad, R. Martin, R. Ramchandani, M. Seddighnezhad, G. Shippen, H. Su, H. Sucar, N. Vasseghi, W. Voegtli, K. Yeager, and Yeffi, “R10000 Superscalar Microprocessor,” Proc. Hot Chips Symp. VII, Aug. 1995.
[3] W. Bowhill et al., "A 300mhz 64b Quad-Issue CMOS Microprocessor," Proc. Int'l Solid-State Circuit Conf., IEEE, Piscataway, N.J., 1995, pp. 182-183.
[4] J. Burns and J.-L. Gaudiot, “Area and System Clock Effects on SMT/CMP Processors,” IEEE Int'l Conf. Parallel Architectures and Compiler Techniques, Sept. 2001.
[5] J. Burns and J.-L. Gaudiot, “SMT Fetch Bottleneck with Multiple Block Fetch,” Proc. Workshop Multi-Threaded Execution, Architecture, and Compilation (MTEAC), Oct. 2000.
[6] J. Burns and J.-L. Gaudiot, “Quantifying the SMT Layout Overhead, Does SMT Pull Its Weight?” Proc. Sixth Int'l Symp. High Performance Computer Architecture, Jan. 2000.
[7] D.W. Dobberpuhl et al., "A 200-MHz 64-b Dual-Issue CMOS Microprocessor," IEEE J. Solid-State Circuits, vol. 27, no. 11, pp. 1,555-1,565, Nov. 1992.
[8] D. Drapper, “The Interconnect Nightmare,” IEEE Int'l Solid-State Circuits Conf. Digest of Technical Papers, p. 278, 1996.
[9] S.J. Eggers et al., "Simultaneous Multithreading: A Platform for Next-Generation Processors," Computer, Sept. 1997, p. 49.
[10] K.I. Farkas, N.P. Jouppi, and P. Chow, “Register File Design Considerations in Dynamically Scheduled Processors,” Proc. Second Ann. Int'l Symp. High-Performance Computer Architecture, pp. 40-51, Jan. 1996.
[11] D.H. Friendly, S.J. Patel,, and Y.N. Patt, ``Alternative Fetch and Issue Techniques for the Trace Cache Fetch Mechanism,'' Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 1997.
[12] L. Gwennap, “Digital 21264 Sets New Standard,” Proc. Microprocessor Forum, 1997.
[13] S. Hily and A. Seznec, “Branch Prediction and Simultaneous Multithreading,” Internal publication #997, Inst. de Recherche en Informatique et Systèmes Aléatoires, Mar. 1996.
[14] S. Hily and A. Seznec, “Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading,” Proc. 1998 Workshop Multi-Threaded Execution, Architecture, and Compilation (MTEAC), 1998.
[15] A. Kumar, “The HP PA-8000 RISC CPU,” Proc. Hot Chips VIII, Aug. 1996.
[16] J.L. Lo et al., "Converting Thread-level Parallelism to Instruction-level Parallelism via Simultaneous Multithreading," ACM Trans. Computer Systems, ACM, Aug. 1997.
[17] J. Lo, S. Eggers, H. Levy, S. Parekh, and D. Tullsen, “Tuning Compiler Optimizations for Simultaneous Multithreading,” Proc. Micro-30, Dec. 1997.
[18] J. Lo, S. Parekh, S. Eggers, H. Levy, and D. Tullsen, “Software-Directed Register Deallocation for Simultaneous Multithreaded Processors,” IEEE Trans. Parallel and Distributed Systems, 1999.
[19] M. Loikkanen and N. Bagherzadeh, “A Fine-Grain Multithreading Superscalar Architecture,” Proc. 1996 Conf. Parallel Architectures and Compilation Techniques, Oct. 1996.
[20] J. Lotz, G. Lesartre, S. Naffzinger, and D. Kipp, “A Quad Issue Out-of-Order RISC CPU,” Proc. IEEE Int'l Solid-State Circuits Conf. Digest of Technical Papers, pp. 210-211, 1996.
[21] K. Olukotun et al., "The Case for a Single-Chip Multiprocessor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, 1996, pp. 2-11.
[22] M. Pontius and N. Bagherzadeh, “Multithreaded Extensions Enhance Multimedia Performance,” Proc. 1999 Workshop Multi-Threaded Execution, Architecture, and Compilation, Jan. 1999.
[23] D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," Proc. 22nd Ann. Int'l Symp. Computer Architecture, IEEE CS Press, 1995, pp. 392-403.
[24] D. M. Tullsen et al., "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," Proc. Int'l Symp. Computer Architecture, ACM, 1996, pp. 191-202.
[25] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.
[26] SGI, “MIPS R10000 Microprocessor Product Overview,” http:/www.sgi.com, 2001.
[27] S. Wallace and N. Bagheryadeh, "A Scalable Register File Architecture for Dynamically Scheduled Processors," Proc. 1996 Conf. Parallel Architectures and Compilation Techniques, 1996, pp. 179-184.
[28] W. Yamamoto, M.J. Serrano, A.R. Talcott, R.C. Wood, and M. Nemirovsky, “Performance Estimation of Multistreamed, Superscalar Processors,” Proc. 27th Hawaii Int'l Conf. System Sciences, pp. I:195-204, Jan. 1994.
[29] W. Yamamoto and M. Nemirovsky, "Increasing Superscalar Performance through Multistreaming," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, IFIP, Laxenburg, Austria, 1995, pp. 49-58.

Index Terms:
SMT, layout area estimation, processor architecture, microarchitecture trade-off.
Citation:
James Burns, Jean-Luc Gaudiot, "SMT Layout Overhead and Scalability," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 2, pp. 142-155, Feb. 2002, doi:10.1109/71.983942
Usage of this product signifies your acceptance of the Terms of Use.