This Article 
 Bibliographic References 
 Add to: 
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
November 1996 (vol. 7 no. 11)
pp. 1133-1149

Abstract—The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (á la rate-optimal) while minimizing the number of buffers—a close approximation to minimizing the number of registers.

The main contributions of this paper are: First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. Secondly, we show that a precise mathematical formulation and its solution does make a significant performance difference. We evaluated the performance of our method against three leading contemporary heuristic methods. Experimental results show that the method described in this paper performed significantly better than these methods.

The techniques proposed in this paper are useful in two different ways: 1) As a compiler option which can be used in generating faster schedules for performance-critical loops (if the interested users are willing to trade the cost of longer compile time with faster runtime). 2) As a framework for compiler writers to evaluate and improve other heuristics-based approaches by providing quantitative information as to where and how much their heuristic methods could be further improved.

[1] A. Aiken,“Compaction-based parallelization,” PhD thesis, Dept. of Computer Science Technical Report no. 88-922, Cornell Univ., 1988.
[2] A. Aiken and A Nicolau,“Optimal loop parallelization,” Proc. 1988 ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 308-317, June 1988.
[3] A. Aiken and A. Nicolau, "A Realistic Resource-Constrained Software Pipelining Algorithm," Advances in Languages and Compilers for Parallel Processing, A. Nicolau, D. Gelernter, T. Gross, and D. Padua, eds., chapter 14, pp. 274-290, Research Monographs in Parallel and Distributed Computing.London, U.K., and Cambridge, Mass.: Pitman Publishing and MIT Press, 1991. Selected papers from the Third Workshop on Languages and Compilers for Parallel Computing, Irvine, Calif., Aug. 1-2, 1990.
[4] J.C. Dehnert and R.A. Towle, “Compiling for Cydra 5,” J. Supercomputing, vol. 7 nos. 1/2, pp. 181-227, May 1993.
[5] K. Ebcioglu,“A compilation technique for software pipelining of loops with conditional jumps,” Proc. 20th Ann. Workshop Microprogramming, pp. 69-79, Dec. 1987.
[6] K. Ebcioglu and A. Nicolau,“A global resource-constrained parallelization technique,” Proc. ACM SIGARCH Int’l Conf. Supercomputing, June 1989.
[7] R.A. Huff,“Lifetime-sensitive modulo scheduling,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 258-267, June 1993.
[8] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[9] S.M. Moon and K. Ebcioglu,“An efficient resource-constrained global scheduling technique for superscalar and VLIW processors,” Proc. 25th Int’l Symp. and Workshop Microarchitecture (MICRO-25), pp. 55-71, Dec. 1992.
[10] A. Nicolau, K. Pingali, and A. Aiken, "Fine-Grain Compilation for Pipelined Machines," Technical Report TR 88-934, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y., 1988.
[11] Q. Ning and G.R. Gao, "A Novel Framework of Register Allocation for Software Pipelining," Conf. Rec. 20th Ann. ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 29-42, Jan. 1993.
[12] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183-198, Oct. 1981.
[13] B.R. Rau,P.P. Tirumalai,, and M.S. Schlansker,“Register allocation for software pipelined loops,” Proc. ACM SIGPLAN’92 Conf. Programming Language Design and Implementation, pp. 283-299, June 1992.
[14] R.F. Touzeau,“A Fortran compiler for the FPS-164 scientific computer,” Proc. 1984 ACM SIGPLAN Symp. Compiler Construction, pp. 48-57, June 1984.
[15] V. Van Dongen, G.R. Gao, and Q. Ning, "A Polynomial Time Method for Optimal Software Pipelining," Proc. Conf. Vector and Parallel Processing, CONPAR-92, Lecture Notes in Computer Science 634, pp. 613-624,Lyons, France, Springer-Verlag, Sept.1-4, 1992.
[16] P. Feautrier, "Dataflow Analysis of Scalar and Array References," Int'l J. Parallel Programming, vol. 20, no. 1, pp. 23-53, 1991.
[17] L.J. Hendren and G.R. Gao, "Designing Programming Languages for Analyzability: A Fresh Look at Pointer Data Structures," Proc. 1992 Int'l Conf. Computer Languages, pp. 242-251,Oakland, Calif., IEEE CS Press, Apr.20-23, 1992.
[18] J.J. Dongarra and A.R. Hinds, "Unrolling Loops in FORTRAN," Software—Practice and Experience, vol. 9, pp. 219-226, Mar. 1979.
[19] J. Wang, C. Eisenbeis, M. Jourdan, and B. Su, "Decomposed Software Pipelining: A New Approach to Exploit Instruction-Level Parallelism for Loop Programs," Research Report No. 1838, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt, France, Jan. 1993.
[20] F. Gasperoni and U. Schwiegelshohn, "Efficient Algorithms for Cyclic Scheduling," Research Report RC 17068, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 1991.
[21] N.J. Warter,S.A. Mahlke,W.W. Hwu,, and B.R. Rau,“Reverse if-conversion,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 290-299, June 1993.
[22] E.R. Altman, R. Govindarajan, and G.R. Rao, "Scheduling and Mapping: Software Pipelining in the Presence of Structural Hazards," Proc. ACM SIGPLAN '95 Conf. Programming Language Design and Implementation, pp. 139-150,La Jolla, Calif., June18-21, 1995. SIGPLAN Notices, vol. 30, no. 6, June 1995.
[23] R. Reiter, "Scheduling Parallel Computations," J. ACM, vol. 15, pp. 590-599, Oct. 1968.
[24] B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs,” Computer, pp. 12-35, Jan. 1989.
[25] Q. Ning, "Register Allocation for Optimal Loop Scheduling," PhD thesis, McGill Univ., Montréal, Québec, Canada, 1993.
[26] E.R. Altman, “Optimal Software Pipelining with Function Unit and Register Constraints,” PhD thesis, McGill Univ., Montréal, Québec, Oct. 1995.
[27] A.E. Eichenberger, E.S. Davidson, and S.G. Abraham, "Minimum Register Requirements for a Modulo Schedule," Proc. 27th Ann. Int'l Symp. Microarchitecture, pp. 75-84,San Jose, Calif., Nov.30- Dec.2, 1994.
[28] J.R. Allen,K. Kennedy,C. Porterfield,, and J. Warren,“Conversion of control dependence to data dependence,” Proc. 1983 Symp. Principles of Programming Languages, pp. 177-189, Jan. 1983.
[29] L.J. Henren, G.R. Gao, E.R. Altman, and C. Mukerji, "A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs," Proc. Fourth Int'l Conf. Compiler Construction, CC '92, U. Kasterns and P. Pfahler, eds., pp. 176-191, Lecture Notes in Computer Science,Paderborn, Germany, Springer-Verlag, Oct.5-7, 1992.
[30] T.C. Hu, Integer Programming and Network Flows, p. 270. Addison-Wesley, 1969.
[31] J. Wang and E. Eisenbeis, "A New Approach to Software Pipelining for Complicated Loops with Branches," research report, Institut National de Recherche en Informatique et an Automatique (INRIA), Rocquencourt, France, Jan. 1993.
[32] G. Gao and Q. Ning, "Loop Storage Optimization for Dataflow Machines," Proc. Fourth Int'l Workshop Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Lecture Notes in Computer Science 589, pp. 359-373,Santa Clara, Calif., Intel Corp., Springer-Verlag, Aug.7-9, 1991, published 1992.
[33] B.R. Rau and J. Fisher,“Instruction-level parallel processing: History, overview, and perspective,” J. SuperComputing, vol. 7, nos. 1/2, Jan. 1993.
[34] J.C. Dehner, P.Y.T. Hsu, and J.P. Bratt, "Overlapped Loop Support in the Cydra 5," Proc. ACM Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1989.
[35] S. Ramakrishnan, "Software Pipelining in PA-RISC Compilers," Hewlett-Packard J., pp. 39-45, June 1992.
[36] B.R. Rau, M.S. Schlansker, and P.P. Tirumalai, "Code Generation Schema for Modulo Scheduled Loops," Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 158-169,Portland, Ore., Dec.1-4, 1992. SIG MICRO Newsletter, vol. 23, nos. 1-2, Dec. 1992.
[37] M. Rajagopalan and V.H. Allan,“Efficient scheduling of fine grain parallelism in loops,” Proc. 26th Ann. Int’l Symp. Microarchitecture, pp. 2-11, Dec. 1993.
[38] K. Ebioglu and T. Nakatani,“A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture,” Languages and Compilers for Parallel Computing, pp. 213-229.Cambridge, Mass.: MIT Press, 1990.
[39] P. Feautrier, "Fine-Grain Scheduling Under Resource Constraints," Languages and Compilers or Parallel Computing, Lectures Notes in Computer Science, no. 892, pp. 1-15. Springer Verlag, 1994.
[40] T.H. Hwang, J.H. Lee, and Y.C. Hsu, "A Formal Approach to the Scheduling Problem in High-Level Synthesis," IEEE Trans. on CAD, Vol. 10, No. 4, Apr. 1991, pp. 464-475.

Index Terms:
Instruction-level parallelism, instruction scheduling, integer linear programming, software pipelining, superscalar and VLIW architectures.
R. Govindarajan, Erik R. Altman, Guang R. Gao, "A Framework for Resource-Constrained Rate-Optimal Software Pipelining," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 11, pp. 1133-1149, Nov. 1996, doi:10.1109/71.544355
Usage of this product signifies your acceptance of the Terms of Use.