This Article 
 Bibliographic References 
 Add to: 
Lifetime-Sensitive Modulo Scheduling in a Production Environment
March 2001 (vol. 50 no. 3)
pp. 234-249

Abstract—This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements, and stage count. Swing Modulo Scheduling is a heuristic approach that has a low computational cost. This paper first describes the technique and evaluates it for the Perfect Club benchmark suite on a generic VLIW architecture. SMS is compared with other heuristic methods, showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. To further explore the effectiveness of SMS, the experience of incorporating it into a production quality compiler for the Equator MAP1000 processor is described; implementation issues are discussed, as well as modifications and improvements to the original algorithm. Finally, experimental results from using a set of industrial multimedia applications are presented.

[1] J.R. Allen,K. Kennedy,C. Porterfield,, and J. Warren,“Conversion of control dependence to data dependence,” Proc. 1983 Symp. Principles of Programming Languages, pp. 177-189, Jan. 1983.
[2] E.R. Altman and G.R. Gao, “Optimal Modulo Scheduling through Enumeration,” Int'l J. Parallel Programming, vol. 26, no. 3, pp. 313-344, 1988.
[3] E. Ayguade, C. Barrado, J. Labarta, D. Lopez, S. Moreno, D. Padua, and M. Valero, “A Uniform Representation for High-Level and Instruction-Level Transformations,” Technical Report UPC-CEPBA 95-01, Universitat Politecnica de Catalunya, Jan. 1995.
[4] M. Berry, D. Chen, P. Koss, and D. Kuck, “The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers,” Technical Report 827, Center of Supercomputing Research and Development, Nov. 1988.
[5] A.E. Charlesworth, “An Approach to Scientific Array Processing: The Architectural Design of the AP120B/FPS-164 Family,” Computer, vol. 14, no. 9, pp. 18-27, Sept. 1981.
[6] T.M. Conte and S.W. Sathaye, “Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW Architectures,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 208-218, Nov. 1995.
[7] J. Cortadella, R.M. Badia, and F. Sanchez, “A Mathematical Formulation of the Loop Pipelining Problem,” Proc. XI Design of Integrated Circuits and Systems Conf. (DCIS '96), Oct. 1996.
[8] B.F. Cutler, “Deep Pipelines Schedule VLIW for Multimedia,” Electronic Eng. Times, no. 1034, 9 Nov. 1998.
[9] A.K. Dani, V. Janaki, and R. Govindarajan, “Register-Sensitive Software Pipelining,” Proc. Merged 12th Int'l Parallel Processing Symp. and Ninth Int'l Symp. Parallel and Distributed Processing, Mar. 1998.
[10] J.C. Dehner, P.Y.T. Hsu, and J.P. Bratt, "Overlapped Loop Support in the Cydra 5," Proc. ACM Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1989.
[11] J.C. Dehnert and R.A. Towle, “Compiling for Cydra 5,” J. Supercomputing, vol. 7 nos. 1/2, pp. 181-227, May 1993.
[12] A.E. Eichenberger and E.S. Davidson, “Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 338-349, Nov. 1995.
[13] A.E. Eichenberger, E.S. Davidson, and S.G. Abraham, “Optimum Modulo Schedules for Minimum Register Requirements,” Proc. Int'l Conf. Supercomputing, pp. 31-40, July 1995.
[14] M. Fernandes, J. Llosa, and N. Topham, “Distributed Modulo Scheduling,” Proc. Fifth Int'l Symp. High-Performance Computer Architecture (HPCA '99), pp. 130-134, Jan. 1999.
[15] P.N. Glaskowsky, “MAP1000 Unfolds at Equator,” Microprocessor Report, vol. 12, no. 16, Dec. 1998.
[16] R. Govindarajan, E.R. Altman, and G.R. Gao, “Minimal Register Requirements under Resource-Constrained Software Pipelining,” Proc. 27th Int'l Ann. Symp. Microarchitecture, pp. 85-94, Nov. 1994.
[17] L. Gwennap, “Intel Discloses New IA-64 Features,” Microprocessor Report, vol. 13, no. 3, pp. 16-19, 8 Mar. 1999.
[18] R.A. Huff,“Lifetime-sensitive modulo scheduling,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 258-267, June 1993.
[19] S. Jain, “Circular Scheduling: A New Technique to Perform Software Pipelining,” Proc. ACM SIGPLAN '91 Conf. Programming Language Design and Implementation, pp. 219-228, June 1991.
[20] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[21] M.S. Lam, A Systolic Array Optimizing Compiler. Boston, Mass.: Kluwer Academic, 1989.
[22] J. Llosa, “Reducing the Impact of Register Pressure on Software Pipelined Loops,” PhD thesis, UPC, Universitat Politècnica de Catalunya, Jan. 1996,
[23] J. Llosa, A. Gonzalez, M. Valero, and E. Ayguade, “Swing Modulo Scheduling: A Lifetime-Sensitive Approach,” Proc. Fourth Parallel Architectures and Compilation Techniques (PACT '96), pp. 80-86, Oct. 1996.
[24] J. Llosa, M. Valero, E. Ayguade, and A. Gonzalez, “Hypernode Reduction Modulo Scheduling,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 350-360, Nov. 1995.
[25] J. Llosa, M. Valero, E. Ayguadé, and A. González, “Modulo Scheduling with Reduced Register Pressure,” IEEE Trans. Computers, vol. 47, no. 6, pp. 625-638, June 1998.
[26] P.G. Lowney et al., "The Multiflow Trace Scheduling Compiler," J. Supercomputing, May 1993, pp. 51-142.
[27] W. Mangione-Smith, S.G. Abraham, and E.S. Davidson, “Register Requirements of Pipelined Processors,” Proc. Int'l Conf. Supercomputing, pp. 260-271, July 1992.
[28] L. Meadows, S. Nakamoto, and V. Schuster, “A Vectorizing, Software Pipelining Compiler for LIW and Superscalar Architectures,” Proc. RISC '92, Feb. 1992.
[29] K. Mehlhorn and S. Näher, “LEDA, a Library of Efficient Data Types and Algorithms,” Technical Report TR A 04/89, Universität des Saarlandes, Saarbrücken, 1989 (available from
[30] E. Nystrom and A.E. Eichenberger, “Effective Cluster Assignment for Modulo Scheduling,” Proc. 31st Int'l Symp. Microarchitecture, pp. 103-114, Dec. 1998.
[31] B.R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelined Loops," Proc. 27th Ann. Int'l Symp. Microarchitecture,San Jose, Calif., Dec. 1994.
[32] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183-198, Oct. 1981.
[33] B.R. Rau,P.P. Tirumalai,, and M.S. Schlansker,“Register allocation for software pipelined loops,” Proc. ACM SIGPLAN’92 Conf. Programming Language Design and Implementation, pp. 283-299, June 1992.
[34] J. Ruttenberg, G.R. Gao, W. Lichtenstein, and A. Stoutchinin, “Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler,” Proc. ACM SIGPLAN '96 Conf. Programming Language Design and Implementation, pp. 1-11, 1996.
[35] J. Sanchez and A. Gonzalez, “The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures,” Proc. Int'l Conf. Parallel Processing (ICPP '2000), pp. 555-562, Aug. 2000.
[36] P. Tirumalai, M. Lee, and M.S. Schlansker, “Parallelisation of Loops with Exits on Pipelined Architectures,” Proc. Supercomputing '90, pp. 100-212, Nov. 1990.
[37] J. Wang, C. Eisenbeis, M. Jourdan, and B. Su, “Decomposed Software Pipelining: A New Perspective and a New Approach,” Int'l J. Parallel Programming, vol. 22, no. 3, pp. 357-379, 1994.
[38] N.J. Warter,S.A. Mahlke,W.W. Hwu,, and B.R. Rau,“Reverse if-conversion,” Proc. ACM SIGPLAN’93 Conf. Programming Language Design and Implementation, pp. 290-299, June 1993.

Index Terms:
Fine grain parallelism, instruction scheduling, loop scheduling, software pipelining, register requirements, VLIW, superscalar architectures.
Josep Llosa, Eduard Ayguadé, Antonio Gonzalez, Mateo Valero, Jason Eckhardt, "Lifetime-Sensitive Modulo Scheduling in a Production Environment," IEEE Transactions on Computers, vol. 50, no. 3, pp. 234-249, March 2001, doi:10.1109/12.910814
Usage of this product signifies your acceptance of the Terms of Use.