
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
R. Govindarajan, Erik R. Altman, Guang R. Gao, "A Framework for ResourceConstrained RateOptimal Software Pipelining," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 11, pp. 11331149, November, 1996.  
BibTex  x  
@article{ 10.1109/71.544355, author = {R. Govindarajan and Erik R. Altman and Guang R. Gao}, title = {A Framework for ResourceConstrained RateOptimal Software Pipelining}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {7}, number = {11}, issn = {10459219}, year = {1996}, pages = {11331149}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.544355}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  A Framework for ResourceConstrained RateOptimal Software Pipelining IS  11 SN  10459219 SP1133 EP1149 EPD  11331149 A1  R. Govindarajan, A1  Erik R. Altman, A1  Guang R. Gao, PY  1996 KW  Instructionlevel parallelism KW  instruction scheduling KW  integer linear programming KW  software pipelining KW  superscalar and VLIW architectures. VL  7 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (á la rateoptimal) while minimizing the number of buffers—a close approximation to minimizing the number of registers.
The main contributions of this paper are: First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rateoptimal schedules) under different sets of constraints. Secondly, we show that a precise mathematical formulation and its solution does make a significant performance difference. We evaluated the performance of our method against three leading contemporary heuristic methods. Experimental results show that the method described in this paper performed significantly better than these methods.
The techniques proposed in this paper are useful in two different ways: 1) As a compiler option which can be used in generating faster schedules for performancecritical loops (if the interested users are willing to trade the cost of longer compile time with faster runtime). 2) As a framework for compiler writers to evaluate and improve other heuristicsbased approaches by providing quantitative information as to where and how much their heuristic methods could be further improved.
[1] A. Aiken,“Compactionbased parallelization,” PhD thesis, Dept. of Computer Science Technical Report no. 88922, Cornell Univ., 1988.
[2] A. Aiken and A Nicolau,“Optimal loop parallelization,” Proc. 1988 ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 308317, June 1988.
[3] A. Aiken and A. Nicolau, "A Realistic ResourceConstrained Software Pipelining Algorithm," Advances in Languages and Compilers for Parallel Processing, A. Nicolau, D. Gelernter, T. Gross, and D. Padua, eds., chapter 14, pp. 274290, Research Monographs in Parallel and Distributed Computing.London, U.K., and Cambridge, Mass.: Pitman Publishing and MIT Press, 1991. Selected papers from the Third Workshop on Languages and Compilers for Parallel Computing, Irvine, Calif., Aug. 12, 1990.
[4] J.C. Dehnert and R.A. Towle, “Compiling for Cydra 5,” J. Supercomputing, vol. 7 nos. 1/2, pp. 181227, May 1993.
[5] K. Ebcioglu,“A compilation technique for software pipelining of loops with conditional jumps,” Proc. 20th Ann. Workshop Microprogramming, pp. 6979, Dec. 1987.
[6] K. Ebcioglu and A. Nicolau,“A global resourceconstrained parallelization technique,” Proc. ACM SIGARCH Int’l Conf. Supercomputing, June 1989.
[7] R.A. Huff,“Lifetimesensitive modulo scheduling,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 258267, June 1993.
[8] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[9] S.M. Moon and K. Ebcioglu,“An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors,” Proc. 25th Int’l Symp. and Workshop Microarchitecture (MICRO25), pp. 5571, Dec. 1992.
[10] A. Nicolau, K. Pingali, and A. Aiken, "FineGrain Compilation for Pipelined Machines," Technical Report TR 88934, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y., 1988.
[11] Q. Ning and G.R. Gao, "A Novel Framework of Register Allocation for Software Pipelining," Conf. Rec. 20th Ann. ACM SIGPLANSIGACT Symp. Principles of Programming Languages, pp. 2942, Jan. 1993.
[12] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183198, Oct. 1981.
[13] B.R. Rau,P.P. Tirumalai,, and M.S. Schlansker,“Register allocation for software pipelined loops,” Proc. ACM SIGPLAN’92 Conf. Programming Language Design and Implementation, pp. 283299, June 1992.
[14] R.F. Touzeau,“A Fortran compiler for the FPS164 scientific computer,” Proc. 1984 ACM SIGPLAN Symp. Compiler Construction, pp. 4857, June 1984.
[15] V. Van Dongen, G.R. Gao, and Q. Ning, "A Polynomial Time Method for Optimal Software Pipelining," Proc. Conf. Vector and Parallel Processing, CONPAR92, Lecture Notes in Computer Science 634, pp. 613624,Lyons, France, SpringerVerlag, Sept.14, 1992.
[16] P. Feautrier, "Dataflow Analysis of Scalar and Array References," Int'l J. Parallel Programming, vol. 20, no. 1, pp. 2353, 1991.
[17] L.J. Hendren and G.R. Gao, "Designing Programming Languages for Analyzability: A Fresh Look at Pointer Data Structures," Proc. 1992 Int'l Conf. Computer Languages, pp. 242251,Oakland, Calif., IEEE CS Press, Apr.2023, 1992.
[18] J.J. Dongarra and A.R. Hinds, "Unrolling Loops in FORTRAN," Software—Practice and Experience, vol. 9, pp. 219226, Mar. 1979.
[19] J. Wang, C. Eisenbeis, M. Jourdan, and B. Su, "Decomposed Software Pipelining: A New Approach to Exploit InstructionLevel Parallelism for Loop Programs," Research Report No. 1838, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt, France, Jan. 1993.
[20] F. Gasperoni and U. Schwiegelshohn, "Efficient Algorithms for Cyclic Scheduling," Research Report RC 17068, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 1991.
[21] N.J. Warter,S.A. Mahlke,W.W. Hwu,, and B.R. Rau,“Reverse ifconversion,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 290299, June 1993.
[22] E.R. Altman, R. Govindarajan, and G.R. Rao, "Scheduling and Mapping: Software Pipelining in the Presence of Structural Hazards," Proc. ACM SIGPLAN '95 Conf. Programming Language Design and Implementation, pp. 139150,La Jolla, Calif., June1821, 1995. SIGPLAN Notices, vol. 30, no. 6, June 1995.
[23] R. Reiter, "Scheduling Parallel Computations," J. ACM, vol. 15, pp. 590599, Oct. 1968.
[24] B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and TradeOffs,” Computer, pp. 1235, Jan. 1989.
[25] Q. Ning, "Register Allocation for Optimal Loop Scheduling," PhD thesis, McGill Univ., Montréal, Québec, Canada, 1993.
[26] E.R. Altman, “Optimal Software Pipelining with Function Unit and Register Constraints,” PhD thesis, McGill Univ., Montréal, Québec, Oct. 1995.
[27] A.E. Eichenberger, E.S. Davidson, and S.G. Abraham, "Minimum Register Requirements for a Modulo Schedule," Proc. 27th Ann. Int'l Symp. Microarchitecture, pp. 7584,San Jose, Calif., Nov.30 Dec.2, 1994.
[28] J.R. Allen,K. Kennedy,C. Porterfield,, and J. Warren,“Conversion of control dependence to data dependence,” Proc. 1983 Symp. Principles of Programming Languages, pp. 177189, Jan. 1983.
[29] L.J. Henren, G.R. Gao, E.R. Altman, and C. Mukerji, "A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs," Proc. Fourth Int'l Conf. Compiler Construction, CC '92, U. Kasterns and P. Pfahler, eds., pp. 176191, Lecture Notes in Computer Science,Paderborn, Germany, SpringerVerlag, Oct.57, 1992.
[30] T.C. Hu, Integer Programming and Network Flows, p. 270. AddisonWesley, 1969.
[31] J. Wang and E. Eisenbeis, "A New Approach to Software Pipelining for Complicated Loops with Branches," research report, Institut National de Recherche en Informatique et an Automatique (INRIA), Rocquencourt, France, Jan. 1993.
[32] G. Gao and Q. Ning, "Loop Storage Optimization for Dataflow Machines," Proc. Fourth Int'l Workshop Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Lecture Notes in Computer Science 589, pp. 359373,Santa Clara, Calif., Intel Corp., SpringerVerlag, Aug.79, 1991, published 1992.
[33] B.R. Rau and J. Fisher,“Instructionlevel parallel processing: History, overview, and perspective,” J. SuperComputing, vol. 7, nos. 1/2, Jan. 1993.
[34] J.C. Dehner, P.Y.T. Hsu, and J.P. Bratt, "Overlapped Loop Support in the Cydra 5," Proc. ACM Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1989.
[35] S. Ramakrishnan, "Software Pipelining in PARISC Compilers," HewlettPackard J., pp. 3945, June 1992.
[36] B.R. Rau, M.S. Schlansker, and P.P. Tirumalai, "Code Generation Schema for Modulo Scheduled Loops," Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 158169,Portland, Ore., Dec.14, 1992. SIG MICRO Newsletter, vol. 23, nos. 12, Dec. 1992.
[37] M. Rajagopalan and V.H. Allan,“Efficient scheduling of fine grain parallelism in loops,” Proc. 26th Ann. Int’l Symp. Microarchitecture, pp. 211, Dec. 1993.
[38] K. Ebioglu and T. Nakatani,“A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture,” Languages and Compilers for Parallel Computing, pp. 213229.Cambridge, Mass.: MIT Press, 1990.
[39] P. Feautrier, "FineGrain Scheduling Under Resource Constraints," Languages and Compilers or Parallel Computing, Lectures Notes in Computer Science, no. 892, pp. 115. Springer Verlag, 1994.
[40] T.H. Hwang, J.H. Lee, and Y.C. Hsu, "A Formal Approach to the Scheduling Problem in HighLevel Synthesis," IEEE Trans. on CAD, Vol. 10, No. 4, Apr. 1991, pp. 464475.