• Publication
  • 1996
  • Issue No. 4 - April
  • Abstract - Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications
April 1996 (vol. 7 no. 4)
pp. 399-410

Abstract—In this paper we present a new class of loop optimizing transformations called valid transformations, which are suitable for fine-grain parallelization applications such as high-level synthesis of VLSI designs or compilers for super-scalar or VLIW machines. This class of transformations are different from existing ones in that valid transformations can be illegal. Nevertheless, if a transformation is valid, the transformed loop has a feasible pipeline schedule. We present an example valid transformation called loop expansion which can help produce cost-performance efficient designs and explore a larger design space for a satisfactory design. Several examples are used to demonstrate the efficacy of the proposed technique.

[1] M.C. McFarland, A.C. Parker, and R. Camposano, "The High-Level Synthesis of Digital Systems," Proc. IEEE, vol. 78, Feb. 1990.
[2] C.-T. Hwang, Y.-C. Hsu, and Y.-L. Lin, "Scheduling for Functional Pipelining and Loop Winding," Proc. ACM/IEEE Design Automation Conf., 1991.
[3] L.F. Chao, A. LaPaugh, and E.H. Sha, "Rotation Scheduling: A Loop Pipelining Algorithm," Proc. ACM/IEEE Design Automation Conf., 1993.
[4] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[5] E. Girczyc, "Loop Winding—A Data Flow Approach to Functional Programming," Proc. IEEE Int'l Symp. Circuits and Systems 1987.
[6] G. Goossens, J. Vandewalle, and H. De Man, "Lopp Optimization in Register-Transfer Scheduling for DSP-Systems," Proc. ACM/IEEE Design Automation Conf., 1989.
[7] C.Y. Wang and K.K. Parhi, "Loop List Scheduler for DSP Algorithms Under Resource Constraints," Proc. Int'l Symp. Circuits and Systems, IEEE, 1993.
[8] A.E. Charlesworth, "An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family," Computer, Sept. 1981.
[9] J.C. Dehner, P.Y.T. Hsu, and J.P. Bratt, "Overlapped Loop Support in the Cydra 5," Proc. ACM Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1989.
[10] K. Ebioglu and T. Nakatani,“A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture,” Languages and Compilers for Parallel Computing, pp. 213-229.Cambridge, Mass.: MIT Press, 1990.
[11] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183-198, Oct. 1981.
[12] B. Su,S. Ding,, and J. Zia,“GURPR—A method for global software pipelining,” Proc. 20th Ann. Workshop Microprogramming, pp. 88-96, Dec. 1987.
[13] R.F. Touzeau,“A Fortran compiler for the FPS-164 scientific computer,” Proc. 1984 ACM SIGPLAN Symp. Compiler Construction, pp. 48-57, June 1984.
[14] M. Wolf and M. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, Oct. 1991.
[15] M. Wolfe,“Optimizing Supercompilers For Supercomputers.”Cambridge, MA: MIT, 1989.
[16] S.Y. Kung, VLSI Array Processors. Prentice Hall, 1988.
[17] J.A. Nestor, "Specification and Synthesis of Digital Systems with Interfaces," PhD thesis, Dept. of Electrical Eng., Carnegie Mellon Univ., Apr. 1987.
[18] B.M. Pangrle and D.D. Gajski, "Design Tools for Intelligent Silicon Compilation," IEEE Trans. Computer-Aided Design, vol. 6, Nov. 1987.
[19] N. Park and A.C. Parker, "Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications," IEEE Trans. Computer-Aided Design, vol. 7, Mar. 1988.
[20] A.C. Parker, J. Pizarro, and M.J. Mlinar, "MAHA: A Program for Datapath Synthesis," Proc. ACM/IEEE Design Automation Conf., 1986.
[21] H. Trickey, "Flamel: A High-Level Hardware Compiler," IEEE Trans. Computer-Aided Design, vol. 6, Mar. 1987.
[22] P.G. Paulin and J.P. Knight, "Force-Directed Scheduling for the Behavioral Synthesis of ASIC's," IEEE Trans. Computer-Aided Design, vol. 8, June 1989.
[23] R. Camposano, "Path-Based Scheduling for Synthesis," IEEE Trans. Computer-Aided Design, vol. 10, Jan. 1991.
[24] L. Lamport, "The Parallel Execution of DO Loops," Comm. ACM, vol. 17, Feb. 1974.
[25] C.D. Polychronopoulos, D.J. Kuck, and D.A. Padua, "Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems," IEEE Trans. Computers, vol. 38, Sept. 1989.
[26] J.H. Saltz, R. Mirchandaney, and K. Crowley, "Run-Time Parallelization and Scheduling of Loops," IEEE Trans. Computers, vol. 40, May 1991.
[27] W. Shang and J.A.B. Fortes, "Time Optimal Linear Schedules for Algorithms with Uniform Dependencies," IEEE Trans. Computers, vol. 40, June 1991.
[28] P.S. Tseng, "Compiling Programs for a Linear Systolic Array," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1990.
[29] C. Wang and S. Wang, "Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors," IEEE Trans. Parallel and Distributed Systems, vol. 3, Jan. 1992.
[30] A. Aiken and A. Nicolau,“Perfect pipelining: A new loop parallelization technique,” Proc. 1988 European Symp. Programming, pp. 221-235, Lecture Notes in Computer Science, no. 300, Springer-Verlag, Mar. 1988.
[31] P.M. Kogge,“The microprogramming of pipelined processors,” Proc. Fourth Ann. Int’l Symp. Computer Architecture, 1977.
[32] J.H. Patel and E.S. Davidson, "Improving the Throughput of a Pipeline by Insertion of Delays," Proc. ACM/IEEE Int'l Symp. Computer Architecture, 1976.
[33] A. Zaky and P. Sadayappan, "Optimal Static Scheduling of Sequential Loops on Multiprocessors," Proc. Int'l Conf. Parallel Processing, 1989.
[34] A. Aiken and A. Nicolau, "Fine-Grain Parallelization and the Wavefront Method," Languages and Compilers for Parallel Computing, D. Gelernter, A. Nicolau, and D. Padua, eds. MIT Press, 1990.
[35] A. Aiken and A Nicolau,“Optimal loop parallelization,” Proc. 1988 ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 308-317, June 1988.
[36] Y. Wong and J.-M. Delosme, "Optimization of Computation Time for Systolic Arrays," IEEE Trans. Computer-Aided Design, vol. 11, Feb. 1992.
[37] A. Wolfe and J.P. Shen, "A Variable Instruction Stream Extension to the VLIW Architecture," Proc. ACM Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1991.
[38] H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers. ACM Press, 1990.
[39] Z. Li, P.-C. Yew, and C.-Q. Zhu, "An Efficient Data Dependence Analysis for Parallelizing Compilers," IEEE Trans. Parallel and Distributed Systems, vol. 1, Jan. 1990.
[40] D. Maydan, J. Hennessy, and M. Lam,“Efficient and exact data dependence analysis,”inProc. ACM SIGPLAN 91' Conf. Progr. Lang. Des., Implement., Toronto, Canada, June 1991, pp. 1–14.
[41] M. Wolfe and C.-W. Tseng, "The Power Test for Data Dependence," IEEE Trans. Parallel and Distributed Systems, vol. 3, Sept. 1992.
[42] V.P. Krothapalli and P. Sadayappan, "Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences," IEEE Trans. Parallel and Distributed Systems, vol. 2, July 1991.
[43] U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua, "Automatic Program Parallelization," Proc. IEEE, vol. 81, Feb. 1993.
[44] D.A. Padua and M.J. Wolfe, "Advanced Compiler Optimizations for Supercomputers," Comm. ACM, vol. 29, Dec. 1986.
[45] A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel and Distributed Systems, vol. 5, Aug. 1994.
[46] M. Rim, "High-Level Synthesis of VLSI Designs for Scientific Programs," PhD thesis, Dept. of Electrical and Computer Eng., Univ. of Wisconsin, Madison, Aug. 1993.
[47] M. Rim and R. Jain, "Estimating Performance Characteristics for Loop Transformations," Proc. IEEE Int'l Symp. Circuits and Systems, 1994.
[48] D.C. Ku and G. De Micheli, "Relative Scheduling Under Timing Constraints: Algorithms for High-Level Synthesis of Digital Circuits," IEEE Trans. Computer-Aided Design, vol. 11, June 1992.
[49] Y. Liao and C. Wong, "An Algorithm to Compact a VLSI Symbolic Layout with Mixed Constraints," IEEE Trans. Computer-Aided Design, vol. 2, Apr. 1983.
[50] R. Karp, R. Miller, and S. Winograd, "The Organization of Computations for Uniform Recurrence Equations," J. ACM, vol. 14, July 1967.
[51] D. Callahan, J. Cocke, and K. Kennedy, "Estimating Interlock and Improving Balance for Pipelined Architectures," Proc. Int'l Conf. Parallel Processing, 1987.
[52] M. Wolfe, "The Tiny Loop Restructuring Research Tool," Proc. Int'l Conf. Parallel Processing, 1991.
[53] M. Wolfe, "Loop Skewing: The Wavefront Method Revisited," Int'l J. Parallel Programming, vol. 15, Aug. 1986.

Index Terms:
High-level synthesis, super-scalar, VLIW, loop compilation, loop optimization, loop transformations, pipeline scheduling.
Citation:
Minjoong Rim, Rajiv Jain, "Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 4, pp. 399-410, April 1996, doi:10.1109/71.494634
Usage of this product signifies your acceptance of the Terms of Use.