This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures
January 2003 (vol. 52 no. 1)
pp. 4-20

Abstract—In this paper, we address the problem of generating an optimal instruction sequence S for a Directed Acyclic Graph (DAG), where S is optimal in terms of the number of registers used. We call this the Minimum Register Instruction Sequence (MRIS) problem. The motivation for revisiting the MRIS problem stems from several modern architecture innovations/requirements that has put the instruction sequencing problem in a new context. We develop an efficient heuristic solution for the MRIS problem. This solution is based on the notion of instruction lineage—a set of instructions that can definitely share a single register. The formation of lineages exploits the structure of the dependence graph to facilitate the sharing of registers not only among instructions within a lineage, but also across lineages. Our efficient heuristics to “fuse” lineages further reduce the register requirement. This reduced register requirement results in generating a code sequence with fewer register spills. We have implemented our solution in the MIPSpro production compiler and measured its performance on the SPEC95 floating point benchmark suite. Our experimental results demonstrate that the proposed instruction sequencing method significantly reduces the number of spill loads and stores inserted in the code, by more than 50 percent in each of the benchmarks. Our approach reduces the average number of dynamic loads and stores executed by 10.4 percent and 3.7 percent, respectively. Further, our approach improves the execution time of the benchmarks on an average by 3.2 percent. In order to evaluate how efficiently our heuristics find a near-optimal solution to the MRIS problem, we develop an elegant integer linear programming formulation for the MRIS problem. Using a commercial integer linear programming solver, we obtain the optimal solution for the MRIS problem. Comparing the optimal solution from the integer linear programming tool with our heuristic solution reveals that, in a very large majority (99.2 percent) of the cases, our heuristic solution is optimal. For this experiment, we used a set of 675 dependence graphs representing basic blocks extracted from scientific benchmark programs.

[1] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[2] E.R. Altman, “Optimal Software Pipelining with Function Unit and Register Constraints,” PhD thesis, McGill Univ., Montréal, Québec, Oct. 1995.
[3] A.W. Appel and L. George, “Optimal Spilling for CISC Machines with Few Registers,” Proc. ACM SIGPLAN 2001 Conf. Programming Language Design and Implementation, pp. 243-253, June 2001.
[4] D. Berson, R. Gupta, and M.L. Soffa, “URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures,” Proc. Conf. Parallel Architectures and Compilation Techniques (PACT '98), June 1998.
[5] D. Berson, R. Gupta, and M.L. Soffa, “Integrated Instruction Scheduling and Register Allocation Techniques,” Proc. 11th Int'l Workshop Languages and Compilers for Parallel Computing,, Aug. 1998.
[6] D.G. Bradlee, S.J. Eggers, and R.R. Henry, “Integrating Register Allocation and Instruction Scheduling for RISCs,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 122-131, Apr. 1991.
[7] P. Briggs, K.D. Cooper, and L. Torczon, “Rematerialization,” Proc. ACM SIGPLAN '92 Conf. Programming Language Design and Implementation, pp. 311-321, June 1992.
[8] P. Briggs et al., "Improvements to Graph Coloring Register Allocation," ACM Trans. Programming Languages and Systems (TOPLAS), Vol. 16, No. 3, May 1994, pp. 428-455.
[9] J.L. Bruno and R. Sethi, “Code Generation for a One-Register Machine,” J. ACM, vol. 23, no. 3, pp. 502-510, July 1976.
[10] G. Chaitin, "Register Allocation and Spilling via Graph Coloring," Proc. SIGPLAN 82 Symp. Compiler Construction, ACM Press, Vol. 17, No. 6, June 1982, pp. 98-105.
[11] G.J. Chaitin, M.A. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, and P.W. Markstein, “Register Allocation via Coloring,” Computer Languages, vol. 6, pp. 47-57, Jan. 1981.
[12] F.C. Chow and J.L. Hennessy, “The Priority-Based Coloring Approach to Register Allocation,” ACM Trans. Programming Languages and Systems, vol. 12, no. 4, pp. 501-536, Oct. 1990.
[13] K. Cooper and T. Harvey, “Compiler Controlled Memory,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 1998.
[14] G.R. Gao, L. Bic, and J.L. Gaudiot, Advanced Topics in Dataflow Computing and Multithreading, IEEE Computer Society Press, Los Alamitos, Calif., 1995.
[15] L. George and A.W. Appel, “Iterated Register Coalescing,” Conf. Record 23rd ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 208-218, Jan. 1996.
[16] P.B. Gibbons and S.S. Muchnick, “Efficient Instruction Scheduling for a Pipelined Architecture,” Proc. SIGPLAN '86 Symp. Compiler Construction, pp. 11-16, June 1986.
[17] J.R. Goodman and W.-C. Hsu, “Code Scheduling and Register Allocation in Large Basic Blocks,” Conf. Proc. 1988 Int'l Conf. Supercomputing, pp. 442-452, July 1988.
[18] D.W. Goodwin and K.D. Wilken, “Optimal and Near-Optimal Global Register Allocation Using 0-1 Integer Programming,” Software—Practice and Experience, vol. 26, no. 8, pp. 929-965, Aug. 1996.
[19] R. Govindarajan, C. Zhang, and G.R. Gao, “Minimum Register Instruction Scheduling: A New Approach for Dynamic Instruction Issue Processors,” Proc. 12th Int'l Workshop Languages and Compilers for Parallel Computing, Aug. 1999.
[20] R. Govindarajan, H. Yang, J.N. Amaral, C. Zhang, and G.R. Gao, “Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs,” Proc. Int'l Parallel and Distributed Processing Symp., Apr. 2001.
[21] L.J. Hendren, G.R. Gao, E.R. Altman, and C. Mukerji, “A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs,” The J. Programming Languages, vol. 1, no. 3, pp. 155-185, 1993.
[22] T.C. Hu, Integer Programming and Network Flows, p. 270. Addison-Wesley, 1969.
[23] W.W. Hwu, S.A. Mahlke, W.Y. Chen, P.P. Chang, N.J. Warter, R.A. Bringmann, R.G. Ouellette, R.E. Hank, T. Kiyohara, G.E. Haab, J.G. Holm,, and D.M. Lavery, ``The Superblock: An Effective Technique for VLIW and Superscalar Compilation,'' J. Supercomputing, vol. 7, pp. 9-50, 1993.
[24] Intel, Intel IA-64 Architecture Software Developer's Manual, Jan. 2000.
[25] C.W. Kessler, “Scheduling Expression DAGs for Minimal Register Need,” Proc. Eighth Int'l Symp. Programming Languages: on Programming Languages: Implementations, Logics, and Programs (PLILP '96), pp. 228-242, Sept. 1996.
[26] S. Mantripragada, S. Jain, and J. Dehnert, “A New Framework for Integrated Global Local Scheduling,” Proc. 1998 Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 167-174, Oct. 1998.
[27] R. Motwani, K.V. Palem, V. Sarkar, and S. Reyan, “Combining Register Allocation and Instruction Scheduling,” technical report, Courant Inst. of Math. Sciences, New York Univ., 1996.
[28] S.S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, Calif., 1997.
[29] B. Natarajan and M. Schlansker, “Spill-Free Parallel Scheduling of Basic Blocks,” Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 119-124, Dec. 1995.
[30] C. Norris and L.L. Pollock, “Register Allocation over the Program Dependence Graph,” Proc. ACM SIGPLAN '94 Conf. Programming Language Design and Implementation, pp. 266-277, June 1994.
[31] S.S. Pinter, “Register Allocation with Instruction Scheduling: A New Approach,” Proc. ACM SIGPLAN '93 Conf. Programming Language Design and Implementation, pp. 248-257, June 1993.
[32] M. Poletto and V. Sarkar, “Linear Scan Register Allocation,” ACM Trans. Programming Languages and Systems, 1998.
[33] A.N. Choudhary, “Cost of Distributed Deadlock Detection: A Performance Study,” Proc. Sixth Int'l Conf. Data Eng., pp.174-181, Feb. 1990.
[34] R. Sethi, “Complete Register Allocation Problems,” SIAM J. Computing, vol. 4, no. 3, pp. 226-248, Sept. 1975.
[35] R. Sethi and J.D. Ullman, “The Generation of Optimal Code for Arithmetic Expressions,” J. ACM, vol. 17, no. 4, pp. 715-728, Oct. 1970.
[36] R. Silvera, J. Wang, G.R. Gao, and R. Govindarajan, “A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors,” Proc. Conf. Parallel Architectures and Compilation Techniques (PACT '97), pp. 78-89, Nov. 1997.
[37] J. Smith and G. Sohi, "The Microarchitecture of Superscalar Processors," Proc. IEEE, vol. 83, 1995, pp. 1609-1624.
[38] S.A.A. Touati, “Register Saturation in Superscalar and VLIW Codes,” Proc. 10th Int'l Conf. Compiler Construction, pp. 213-228, Apr. 2001.
[39] O. Traub, G. Holloway, and M.D. Smith, “Quality and Speed in Linear-Scan Register Allocation,” Proc. ACM SIGPLAN '98 Conf. Programming Language Design and Implementation, pp. 142-151, June 1998.
[40] M.G. Valluri and R. Govindarajan, “Evaluating Register Allocation and Instruction Scheduling Techniques in Out-of-Order Issue Processors,” Proc. Conf. Parallel Architectures and Compilation Techniques (PACT '99), Oct. 1999.
[41] H. S. Warren, Jr.,“Instruction scheduling for the IBM RISC System/6000 processor,”IBM J. Res. Develop., vol. 34, pp. 85–92, 1990.
[42] W. Wulf, R.K. Johnsson, C.B. Weinstock, S.O. Hobbs, and C.M. Geschke, The Design of an Optimizing Compiler. New York: American Elsevier, 1975.

Index Terms:
Compiler optimization, code sequence optimization, register allocation, instruction scheduling, code generation, superscalar architectures, instruction level parallelism.
Citation:
R. Govindarajan, Hongbo Yang, José Nelson Amaral, Chihong Zhang, Guang R. Gao, "Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures," IEEE Transactions on Computers, vol. 52, no. 1, pp. 4-20, Jan. 2003, doi:10.1109/TC.2003.1159750
Usage of this product signifies your acceptance of the Terms of Use.