This Article 
 Bibliographic References 
 Add to: 
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor
August 2005 (vol. 54 no. 8)
pp. 998-1012
Low-power embedded processors utilize compact instruction encodings to achieve small code size. Such encodings place tight restrictions on the number of bits available to encode operand specifiers and, thus, on the number of architected registers. As a result, performance and power are often sacrificed as the burden of operand supply is shifted from the register file to the memory due to the limited number of registers. In this paper, we investigate the use of a windowed register file to address this problem by providing more registers than allowed in the encoding. The registers are organized as a set of identical register windows where, at each point in the execution, there is a single active window. Special window management instructions are used to change the active window and to transfer values between windows. This design gives the appearance of a large register file without compromising the instruction encoding. To support the windowed register file, we designed and implemented a graph partitioning-based compiler algorithm that partitions program variables and temporaries referenced within a procedure across multiple windows. On a 16-bit embedded processor, an average of 11 percent improvement in application performance and 25 percent reduction in system power was achieved as an 8-register design was scaled from one to two windows.

[1] A. Aletà, J. Codina, J. Sánchez, A. González, and D. Kaeli, “Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning,” Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 281-290, Sept. 2002.
[2] Analog Devices, ADSP-219x/2191 DSP Hardware Reference Manual, July 2001, library/dspManualsADSP-2191_hardware_reference.html .
[3] J. Cho, Y. Paek, and D. Whalley, “Register and Memory Assignment for Non-Orthogonal Architectures via Graph Coloring and mst Algorithms,” Proc. ACM SIGPLAN Conf. Languages, Compilers, and Tools for Embedded Systems & Software and Compilers for Embedded Systems, pp. 130-138, June 2002.
[4] M. Chu, K. Fan, and S. Mahlke, “Region-Based Hierarchical Operation Partitioning for Multicluster Processors,” Proc. SIGPLAN '03 Conf. Programming Language Design and Implementation, pp. 300-311, June 2003.
[5] J.-L. Cruz, A. Gonzalez, M. Valero, and N. Topham, “Multiple-Banked Register File Architecture,” Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 316-325, June 2000.
[6] G. Desoli, “Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach,” Technical Report HPL-98-13, Hewlett-Packard Laboratories, Feb. 1998.
[7] P. Faraboschi, G. Desoli, and J. Fisher, “Clustered Instruction-Level Parallel Processors,” Technical Report HPL-98-204, Hewlett-Packard Laboratories, Dec. 1998.
[8] K. Farkas, P. Chow, N. Jouppi, and Z. Vranesic, “The Multicluster Architecture: Reducing Cycle Time through Partitioning,” Proc. 30th Ann. Intl Symp. Microarchitecture, pp. 149-159, Dec. 1997.
[9] M.M. Fernandes, J. Llosa, and N. Topham, “Allocating Lifetimes to Queues in Software Pipelined Architectures,” Proc. Third Int'l Euro-Par Conf., pp. 1066-1073, Aug. 1997.
[10] C. Fiduccia and R. Mattheyses, “A Linear Time Heuristic for Improving Network Partitions,” Proc. 19th Design Automation Conf., pp. 175-181, 1982.
[11] M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown, “MiBench: A Free, Commercially Representative Embedded Benchmark Suite,” Proc. Fourth IEEE Workshop Workload Characterization, pp. 10-22, Dec. 2001.
[12] B. Hendrickson and R. Leland, The Chaco User's Guide. Sandia Nat'l Laboratories, July 1995.
[13] J. Hiser, S. Carr, and P. Sweany, “Global Register Partitioning,” Proc. Ninth Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 13-23, Oct. 2000.
[14] W.M. Hwu et al., “The Superblock: An Effective Technique for VLIW and Superscalar Compilation,” J. Supercomputing, vol. 7, no. 1, pp. 229-248, May 1993.
[15] Intel Corp., Intel IA-64 Software Developer's Manual, 2002.
[16] G. Karypis and V. Kumar, Metis: A Software Package for Paritioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Orderings of Sparse Matrices, Univ. of Minnesota, Sept. 1998.
[17] V. Kathail, M. Schlansker, and B. Rau, “HPL PlayDoh Architecture Specification: Version 1.0,” Technical Report HPL-93-80, Hewlett-Packard Laboratories, Feb. 1993.
[18] B. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell System Technical J., vol. 49, no. 2, pp. 291-207, Feb. 1970.
[19] H. Kim, “Region-Based Register Allocation for EPIC Architectures,” PhD thesis, Dept. of Computer Science, New York Univ., 2001, .
[20] K. Kiyohara, S.A. Mahlke, W.Y. Chen, R.A. Bringmann, R.E. Hank, S. Anik, and W.W. Hwu, “Register Connection: A New Approach to Adding Registers into Instruction Set Architectures,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 247-256, May 1993.
[21] A. Krishnaswamy and R. Gupta, “Profile Guided Selection of ARM and Thumb Instructions,” Proc. ACM SIGPLAN Conf. Languages, Compilers, and Tools for Embedded Systems & Software and Compilers for Embedded Systems, pp. 55-63, June 2002.
[22] C. Lee, M. Potkonjak, and W. Mangione-Smith, “Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. 30th Ann. Int'l Symp. Microarchitecture, pp. 330-335, 1997.
[23] R. Leupers and D. Kotte, “Variable Partitioning for Dual Memory Bank dsps,” Proc. IEEE Int'l Conf. Acoustics Speech and Signal Processing, pp. 1121-1124, May 2001.
[24] P. Marwedel and G. Goossens, Code Generation for Embedded Processors. Boston: Kluwer Academic, 1995.
[25] Motorola, CPU12 Reference Manual, June 2003, http://e-www. docsCPU12RM.pdf.
[26] M. Poletto and V. Sarkar, “Linear Scan Register Allocation,” ACM Trans. Programming Languages and Systems, vol. 21, no. 5, pp. 895-913, Sept. 1999.
[27] R. Ravindran, R. Senger, E. Marsman, G. Dasika, M. Guthaus, S. Mahlke, and R. Brown, “Increasing the Number of Effective Registers in a Low-Power Processor Using a Windowed Register File,” Proc. 2003 Int'l Conf. Compilers, Architecture, and Synthesis for Embedded Systems, pp. 125-136, 2003.
[28] D. Seal, ARM Architecture Reference Manual. London: Addison-Wesley, 2000.
[29] R. Senger, E. Marsman, M. McCorquodale, F. Gebara, K. Kraver, M. Guthaus, and R. Brown, “A 16-Bit Mixed-Signal Microsystem with Integrated CMOS-MEMS Clock Reference,” Proc. 40th Design Automation Conf., pp. 520-525, 2003.
[30] M. Smelyanskiy, G. Tyson, and E. Davidson, “Register Queues: A New Hardware/Software Approach to Efficient Software Pipelining,” Proc. Ninth Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 3-12, Oct. 2000.
[31] SPARC International Inc., The SPARC Architecture Manual, Version 8, 1992,
[32] MIPS32 Architecture for Programmers Volume IV-a: The MIPS16 Application Specific Extension to the MIPS32 Architecture, MIPS Tech nologies, Mar. 2001.
[33] Tensilica Inc., Xtensa Architecture and Performance, Sept. 2002, http://www.tensilica.comxtensa_arch_white_paper.pdf .
[34] Texas Instruments, TMS320C54X DSP Reference Set, Mar. 2001, .
[35] Texas Instruments, TMS320C6000 CPU and Instruction Set Reference Guide, June 2004, pdf .
[36] V. Tiwari, S. Malik, and A. Wolfe, “Power Analysis of Embedded Software: A First Step towards Software Power Minimization,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 2, no. 4, pp. 437-445, 1994.
[37] Trimaran, “An Infrastructure for Research in ILP,” 2000, http:/

Index Terms:
Index Terms- Code generation, embedded processor, graph partitioning, instruction encoding, low-power design, optimization, retargetable compilers, register window, spill code.
Rajiv A. Ravindran, Robert M. Senger, Eric D. Marsman, Ganesh S. Dasika, Matthew R. Guthaus, Scott A. Mahlke, Richard B. Brown, "Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor," IEEE Transactions on Computers, vol. 54, no. 8, pp. 998-1012, Aug. 2005, doi:10.1109/TC.2005.132
Usage of this product signifies your acceptance of the Terms of Use.