This Article 
 Bibliographic References 
 Add to: 
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
September 1999 (vol. 10 no. 9)
pp. 922-933

Abstract—This paper proposes and evaluates software techniques that increase register file utilization for simultaneous multithreading (SMT) processors. SMT processors require large register files to hold multiple thread contexts that can issue instructions out of order every cycle. By supporting better interthread sharing and management of physical registers, an SMT processor can reduce the number of registers required and can improve performance for a given register file size. Our techniques specifically target register deallocation. While out-of-order processors with register renaming are effective at knowing when a new physical register must be allocated, they have limited knowledge of when physical registers can be deallocated. We propose architectural extensions that permit the compiler and operating system to: 1) free registers immediately upon their last use, and 2) free registers allocated to idle thread contexts. Our results, based on detailed instruction-level simulations of an SMT processor, show that these techniques can increase performance significantly for register-intensive, multithreaded programs.

[1] R. Alverson et al., "The Tera Computer System," Proc. Int'l Conf. Supercomputing, Assoc. of Computing Machinery, N.Y., 1990, pp. 1-6.
[2] J. Boyle, R. Butler, T. Diaz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens, Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, 1987.
[3] A. Capitanio, N. Dutt, and A. Nicolau, “Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs,” Proc. 25th Int'l Symp. Microarchitecture (MICRO-25), pp. 292-300, Dec. 1992.
[4] R.P. Colwell, R.P. Nix, J.J. O'Donnell, D.B. Papworth,, and P.K. Rodman, ``A VLIW Architecture for a Trace Scheduling Compiler,'' IEEE Trans. Computers, vol. 37, no. 8, pp. 967-979, Aug. 1988.
[5] S.J. Eggers et al., "Simultaneous Multithreading: A Platform for Next-Generation Processors," Computer, Sept. 1997, p. 49.
[6] K.I. Farkas, N.P. Jouppi, and P. Chow, “Register File Design Considerations in Dynamically Scheduled Processors,” Proc. Second Ann. Int'l Symp. High-Performance Computer Architecture, pp. 40-51, Jan. 1996.
[7] M. Franklin and G. Sohi, Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors Proc. Int'l Symp. Microarchitecture, 1992.
[8] L. Gwennap, “Digital 21264 Sets New Standard,” Microprocessor Report, pp. 11-16, Oct. 1996.
[9] M.W. Hall et al., "Maximizing Multiprocessor Performance with the SUIF Compiler," Computer, Dec. 1996, pp. 84-89.
[10] H. Hirata et al., "An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads," Proc. Int'l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., 1992, pp. 136-145.
[11] J. Janssen and H. Corporaal, “Partitioned Register File for TTA,” Proc. 28th Int'l Symp. Microarchitecture (MICRO-28), pp. 303-312, Nov./Dec. 1995.
[12] T. Kiyohara, S. Mahlke, W. Chen, R. Bringmann, R. Hank, S. Anik, and W. Hwu, “Register Connection: A New Approach to Adding Registers into Instruction Set Architectures,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 247-256, May 1993.
[13] J. Llosa, M. Valero, and E. Ayguade, “Non-Consistent Dual Register Files to Reduce Register Pressure,” Proc. First Ann. Int'l Symp. High-Performance Computer Architecture, pp. 22-31, Jan. 1995.
[14] J.L. Lo et al., "Converting Thread-level Parallelism to Instruction-level Parallelism via Simultaneous Multithreading," ACM Trans. Computer Systems, ACM, Aug. 1997.
[15] P.G. Lowney et al., "The Multiflow Trace Scheduling Compiler," J. Supercomputing, May 1993, pp. 51-142.
[16] G. Lozano and G. Gao, Exploiting Short-Lived Variables in Superscalar Processors Proc. Int'l Symp. Microarchitecture, pp. 292-302, 1995.
[17] M. Martin, A. Roth, and C. Fischer, “Exploiting Dead Value Information,” Proc. 30th Int'l Symp. Microarchitecture, pp. 125-135, Dec. 1997.
[18] S. McFarling, “Combining Branch Predictors,” Technical Report TN-36, DEC-WRL, June 1993.
[19] P.R. Nuth and W.J. Dally, “The Named-State Register File: Implementation and Performance,” Proc. First Ann. Int'l Symp. High-Performance Computer Architecture, pp. 4-13, Jan. 1995.
[20] A.R. Pleszkun and G.S. Sohi, “The Performance Potential of Multiple Functional Unit Processors,” Proc. 15th Ann. Int'l Symp. Computer Architecture, pp. 37-44, May 1988.
[21] J.P. Singh, J.L. Hennessy, and A. Gupta, “Scaling Parallel Programs for Multiprocessors: Methodology and Examples,” Computer, vol. 26, no. 7, pp. 42-50, July 1993.
[22] E. Sprangle and Y. Patt, ``Facilitating Superscalar Processing via a Combined Static/Dynamic Register Renaming Scheme,'' Proc. 27th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 143-147, 1994.
[23] “The Standard Performance Evaluation Council,” SPEC CPU 95 Technical Manual, Aug. 1995.
[24] M. Tremblay, B. Joy, and K. Shin, "A Three Dimensional Register File for Superscalar Processors," Proc. 28th Ann. Hawaii Int'l Conf. Systems Sciences, IEEE CS Press, 1995, pp. 191-201.
[25] D. M. Tullsen et al., "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," Proc. Int'l Symp. Computer Architecture, ACM, 1996, pp. 191-202.
[26] D.M. Tullsen, J.L. Lo, S.J. Eggers, and H.M. Levy, “Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor,” Proc. Fifth Ann. Int'l Symp. High-Performance Computer Architecture, Jan. 1999.
[27] C.A. Waldspurger and W.E. Weihl, “Register Relocation: Flexible Contexts for Multithreading,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 120-129, May 1993.
[28] S. Wallace, B. Calder, and D.M. Tullsen, “Threaded Multiple Path Execution,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 238-249, July 1998.
[29] S. Wallace and N. Bagheryadeh, "A Scalable Register File Architecture for Dynamically Scheduled Processors," Proc. 1996 Conf. Parallel Architectures and Compilation Techniques, 1996, pp. 179-184.
[30] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[31] W. Yamamoto, M.J. Serrano, A.R. Talcott, R.C. Wood, and M. Nemirovsky, “Performance Estimation of Multistreamed, Superscalar Processors,” Proc. 27th Hawaii Int'l Conf. System Sciences, pp. I:195-204, Jan. 1994.
[32] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.

Index Terms:
Multithreaded architecture, simultaneous multithreading, register file, architecture.
Jack L. Lo, Sujay S. Parekh, Susan J. Eggers, Henry M. Levy, Dean M. Tullsen, "Software-Directed Register Deallocation for Simultaneous Multithreaded Processors," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 9, pp. 922-933, Sept. 1999, doi:10.1109/71.798316
Usage of this product signifies your acceptance of the Terms of Use.