This Article 
 Bibliographic References 
 Add to: 
Modulo Scheduling with Reduced Register Pressure
June 1998 (vol. 47 no. 6)
pp. 625-638

Abstract—Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Modulo scheduling refers to a class of algorithms for software pipelining. Most previous research on modulo scheduling has focused on reducing the number of cycles between the initiation of consecutive iterations (which is termed II) but has not considered the effect of the register pressure of the produced schedules. The register pressure increases as the instruction level parallelism increases. When the register requirements of a schedule are higher than the available number of registers, the loop must be rescheduled perhaps with a higher II. Therefore, the register pressure has an important impact on the performance of a schedule. This paper presents a novel heuristic modulo scheduling strategy that tries to generate schedules with the lowest II, and, from all the possible schedules with such II, it tries to select that with the lowest register requirements. The proposed method has been implemented in an experimental compiler and has been tested for the Perfect Club benchmarks. The results show that the proposed method achieves an optimal II for at least 97.5 percent of the loops and its compilation time is comparable to a conventional top-down approach, whereas the register requirements are lower. In addition, the proposed method is compared with some other existing methods. The results indicate that the proposed method performs better than other heuristic methods and almost as well as linear programming methods, which obtain optimal solutions but are impractical for product compilers because their computing cost grows exponentially with the number of operations in the loop body.

[1] V.H. Allan, R.B. Jones, R.M. Lee, and S.J. Allan, “Software Pipelining,” ACM Computing Surveys, vol. 27, no. 3, pp. 367-432, Sept. 1995.
[2] J.R. Allen,K. Kennedy,C. Porterfield,, and J. Warren,“Conversion of control dependence to data dependence,” Proc. 1983 Symp. Principles of Programming Languages, pp. 177-189, Jan. 1983.
[3] E. Ayguadé, C. Barrado, A. González, J. Labarta, J. Llosa, D. López, S. Moreno, D. Padua, F. Reig, Q. Riera, and M. Valero, "Ictineo: A Tool for Instruction Level Parallelism Research," Technical Report UPC-DAC-96-61, Universitat Politècnica de Catalunya, Dec. 1996.
[4] M. Berry, D. Chen, P. Koss, and D. Kuck, "The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers," Technical Report 827, Center for Supercomputing Research and Development, Nov. 1988.
[5] A.E. Charlesworth, "An Approach to Scientific Array Processing: The Architectural Design of the AP120B/FPS-164 Family," Computer, vol. 14, no. 9, pp. 18-27, Sept. 1981.
[6] J.C. Dehner, P.Y.T. Hsu, and J.P. Bratt, "Overlapped Loop Support in the Cydra 5," Proc. ACM Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1989.
[7] J.C. Dehnert and R.A. Towle, “Compiling for Cydra 5,” J. Supercomputing, vol. 7 nos. 1/2, pp. 181-227, May 1993.
[8] A.E. Eichenberger and E.S. Davidson, “Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 338-349, Nov. 1995.
[9] A.E. Eichenberger, E.S. Davidson, and S.G. Abraham, “Optimum Modulo Schedules for Minimum Register Requirements,” Proc. Int'l Conf. Supercomputing, pp. 31-40, July 1995.
[10] R. Govindarajan, E.R. Altman, and G.R. Gao, “Minimal Register Requirements under Resource-Constrained Software Pipelining,” Proc. 27th Int'l Ann. Symp. Microarchitecture, pp. 85-94, Nov. 1994.
[11] P.Y.T. Hsu, "Highly Concurrent Scalar Processing," PhD thesis, Univ. of Illi nois, Urbana-Champaign, 1986.
[12] R.A. Huff,“Lifetime-sensitive modulo scheduling,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 258-267, June 1993.
[13] S. Jain, “Circular Scheduling: A New Technique to Perform Software Pipelining,” Proc. ACM SIGPLAN '91 Conf. Programming Language Design and Implementation, pp. 219-228, June 1991.
[14] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[15] M.S. Lam, A Systolic Array Optimizing Compiler. Boston, Mass.: Kluwer Academic, 1989.
[16] J. Llosa, "Reducing the Impact of Register Pressure on Software Pipelining," PhD thesis, UPC. Universitat Politècnica de Catalunya, Jan. 1996.
[17] J. Llosa, M. Valero, and E. Ayguadé, “Heuristics for Register-Constrained Software Pipelining,” Proc. 29th Int'l Symp. Microarchitecture (MICRO-29), pp. 250-261, Dec. 1996.
[18] J. Llosa, M. Valero, E. Ayguade, and A. Gonzalez, “Hypernode Reduction Modulo Scheduling,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 350-360, Nov. 1995.
[19] J. Llosa, M. Valero, E. Ayguadé, and J. Labarta, "Register Requirements of Pipelined Loops and Their Effect on Performance," Proc. Second Int'l Workshop Massive Parallelism: Hardware, Software, and Applications, pp. 173-189, Oct. 1994.
[20] W. Mangione-Smith, S.G. Abraham, and E.S. Davidson, “Register Requirements of Pipelined Processors,” Proc. Int'l Conf. Supercomputing, pp. 260-271, July 1992.
[21] Q. Ning and G.R. Gao, "A Novel Framework of Register Allocation for Software Pipelining," Conf. Rec. 20th Ann. ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 29-42, Jan. 1993.
[22] S. Ramakrishnan, "Software Pipelining in PA-RISC Compilers," Hewlett-Packard J., pp. 39-45, July 1992.
[23] B.R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelined Loops," Proc. 27th Ann. Int'l Symp. Microarchitecture,San Jose, Calif., Dec. 1994.
[24] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183-198, Oct. 1981.
[25] B.R. Rau,P.P. Tirumalai,, and M.S. Schlansker,“Register allocation for software pipelined loops,” Proc. ACM SIGPLAN’92 Conf. Programming Language Design and Implementation, pp. 283-299, June 1992.
[26] B. Su and J. Wang, "GURPR*: A New Global Software Pipelining Algorithm," Proc. 24th Ann. Int'l Symp. Microarchitecture (MICRO-24), pp. 212-216, Nov. 1991.
[27] P. Tirumalai, M. Lee, and M.S. Schlansker, “Parallelisation of Loops with Exits on Pipelined Architectures,” Proc. Supercomputing '90, pp. 100-212, Nov. 1990.
[28] J. Wang, C. Eisenbeis, M. Jourdan, and B. Su, “Decomposed Software Pipelining: A New Perspective and a New Approach,” Int'l J. Parallel Programming, vol. 22, no. 3, pp. 357-379, 1994.
[29] N.J. Warter, G.E. Haab, J.W. Bockhaus, and K. Subramanian, “Enhanced Modulo Scheduling for Loops with Conditional Branches,” Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 170-179, 1-4 Dec. 1992.
[30] N. Warter and N. Partamian, Modulo Scheduling with Multiple Initiation Intervals Proc. 28th Int'l Symp. Microarchitecture, pp. 111-118, Nov. 1995.

Index Terms:
Instruction scheduling, loop scheduling, software pipelining, register allocation, register spilling.
Josep Llosa, Mateo Valero, Eduard Ayguadé, Antonio González, "Modulo Scheduling with Reduced Register Pressure," IEEE Transactions on Computers, vol. 47, no. 6, pp. 625-638, June 1998, doi:10.1109/12.689643
Usage of this product signifies your acceptance of the Terms of Use.