| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Register Constrained Modulo Scheduling
May 2004 (vol. 15 no. 5)
pp. 417-430
Abstract—Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software pipelining techniques is their high register requirements. If the requirements exceed the number of registers available in the target architecture, some steps need to be applied to reduce the register pressure (incurring some performance degradation): reduce iteration overlapping or spilling some lifetimes to memory. In the first part of this paper, we propose a set of heuristics to improve the spilling process and to better decide between adding spill code or directly decreasing the execution rate of iterations. The experimental evaluation, over a large number of representative loops and for a processor configuration, reports an increase in performance by a factor of 1.29 and a reduction of memory traffic by a factor of 1.36. In the second part of this paper, we analyze the use of backtracking and propose a novel approach for simultaneous instruction scheduling and register spilling in modulo scheduling: MIRS (Modulo Scheduling with Integrated Register Spilling). The experimental evaluation reports an increase in performance by a factor of 1.46 and a reduction of the memory traffic by a factor of 1.66 (or an additional 1.13 and 1.22 with regard to the proposal in the first part of the paper). These improvements are achieved at the expense of a reasonable increase in the compilation time.
[1] V. Allan, R. Jones, R. Lee, and S. Allan, Software Pipelining ACM Computing Surveys, vol. 27, no. 3, pp. 367-432, Sept. 1995.
[2] J. Allen, K. Kennedy, and J. Warren, Conversion of Control Dependence to Data Dependence Proc. 10th Ann. Symp. Principles of Programming Languages, Jan. 1983.
[3] D. Bernstein, D. Goldin, M. Golumbic, H. Krawczyk, Y. Mansour, I. Nahshon, and R. Pinter, Spill Code Minimization Techniques for Optimizing Compilers Proc. ACM SIGPLAN Conf. Programming Languages Design and Implementation, pp. 258-263, July 1989.
[4] M. Berry, D. Chen, P. Koss, and D. Kuck, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers Technical Report 827, Center for Supercomputing Research and Development, Nov. 1988.
[5] P. Briggs, K. Cooper, K. Kennedy, and L. Torczon, Coloring Heuristics for Register Allocation Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 275-284, June 1989.
[6] P. Briggs, K. Cooper, and L. Torczon, Improvements to Graph Coloring Register Allocation ACM Trans. Programming Languages and Systems, vol. 16, no. 3, pp. 428-455, May 1994.
[7] D. Callahan and B. Koblenz, Register Allocation via Hierarchical Graph Coloring Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 192-203, June 1991.
[8] G. Chaitin, Register Allocation and Spilling via Graph Coloring Proc. ACM SIGPLAN Symp. Compiler Construction, pp. 98-105, June 1982.
[9] A. Charlesworth, An Approach to Scientific Array Processing: The Architectural Design of the AP120B/FPS-164 Family Computer, vol. 14, no. 9, pp. 18-27, 1981.
[10] J. Codina, J. Llosa, and A. González, A Comparative Study of Modulo Scheduling Techniques Proc. Int'l Conf. Supercomputing, pp. 97-106, June 2002.
[11] A.K. Dani, V. Janaki, and R. Govindarajan, “Register-Sensitive Software Pipelining,” Proc. Merged 12th Int'l Parallel Processing Symp. and Ninth Int'l Symp. Parallel and Distributed Processing, Mar. 1998.
[12] J. Dehnert, P. Hsu, and J. Bratt, Overlapped Loop Support in the Cydra 5 Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 26-38, Apr. 1989.
[13] J. Dehnert and R. Towle, Compiling for the Cydra 5 J. Supercomputing, vol. 7, nos. 1/2, pp. 181-228, May 1993.
[14] A.E. Eichenberger and E.S. Davidson, “Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 338-349, Nov. 1995.
[15] C. Eisenbeis, S. Lelait, and B. Marmol, The Meeting Graph: A New Model for Loop Cyclic Register Allocation Proc. Fifth Workshop Compilers for Parallel Computers, pp. 503-516, June 1995.
[16] L. Hendren, G. Gao, E. Altman, and C. Mukerji, Register Allocation Using Cyclic Interval Graphs: A New Approach to an Old Problem ACAPS Technical Memo 33, Advanced Computer Architecture and Program Structures Group, McGill Univ., 1992.
[17] R. Huff, Lifetime-Sensitive Modulo Scheduling Proc. Sixth Conf. Programming Language, Design and Implementation, pp. 258-267, 1993.
[18] S. Jain, Circular Scheduling: A New Technique to Perform Software Pipelining Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 219-228, June 1991.
[19] M. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 318-328, June 1988.
[20] M. Lam, A Systolic Array Optimizing Compiler. Kluwer Academic Publishers, 1989.
[21] J. Llosa, Reducing the Impact of Register Pressure on Software Pipelining PhD thesis, UPC. Universitat Politècnica de Catalunya, Jan. 1996.
[22] J. Llosa, A. González, E. Ayguadé, M. Valero, and J. Eckhardt, Lifetime-Sensitive Modulo Scheduling in a Production Environment IEEE Trans. Computers, vol. 50, no. 3, Mar. 2001.
[23] J. Llosa, M. Valero, and E. Ayguadé, “Heuristics for Register-Constrained Software Pipelining,” Proc. 29th Int'l Symp. Microarchitecture (MICRO-29), pp. 250-261, Dec. 1996.
[24] J. Llosa, M. Valero, E. Ayguade, and A. Gonzalez, “Hypernode Reduction Modulo Scheduling,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 350-360, Nov. 1995.
[25] W. Mangione-Smith, S. Abraham, and E. Davidson, Register Requirements of Pipelined Processors Proc. Int'l Conf. Supercomputing, pp. 260-246, July 1992.
[26] S. Ramakrishnan, Software Pipelining in PA-RISC Compilers Hewlett-Packard J., pp. 39-45, July 1992.
[27] B. Rau and C. Glaeser, Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing Proc. 14th Ann. Microprogramming Workshop, pp. 183-197, Oct. 1981.
[28] B. Rau, M. Lee, P. Tirumalai, and P. Schlansker, Register Allocation for Software Pipelined Loops Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 283-299, June 1992.
[29] B.R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelined Loops," Proc. 27th Ann. Int'l Symp. Microarchitecture,San Jose, Calif., Dec. 1994.
[30] J. Ruttenberg, G. Gao, A. Stoutchinin, and W. Lichtenstein, Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler Proc. ACM SIGPLAN Conf. Programming Languages Design and Implementation, pp. 1-11, May 1996.
[31] F. Sánchez, Loop Pipelining with Resource and Timing Constraints PhD thesis, UPC, Universitat Politècnica de Catalunya, Oct. 1995.
[32] P. Tirumalai, M. Lee, and M.S. Schlansker, “Parallelisation of Loops with Exits on Pipelined Architectures,” Proc. Supercomputing '90, pp. 100-212, Nov. 1990.
[33] J. Wang, A. Krall, M.A. Ertl, and C. Eisenbeis, Software Pipelining with Register Allocation and Spilling Proc. 27th Int'l Symp. Microarchitecture, pp. 95-99, Nov. 1994.
[34] N. Warter and N. Partamian, Modulo Scheduling with Multiple Initiation Intervals Proc. 28th Int'l Symp. Microarchitecture, pp. 111-118, Nov. 1995.
[35] J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero, Improved Spill Code Generation for Software Pipelined Loops Proc. Conf. Programming Languages Design and Implementation, pp. 134-144, June 2000.
[36] J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero, MIRS: Modulo Scheduling with Integrated Register Spilling Proc. 14th Workshop Languages and Compilers for Parallel Computing, Aug. 2001.
Index Terms:
Instruction level parallelism, instruction scheduling, modulo scheduling, register allocation, spill code.
Citation:
Javier Zalamea, Josep Llosa, Eduard Ayguad?, Mateo Valero, "Register Constrained Modulo Scheduling," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 5, pp. 417-430, May 2004, doi:10.1109/TPDS.2004.1278099