The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.61)
pp: 1800-1812
Qiang Liu , Imperial College London, London
Tim Todman , Imperial College London, London
Wayne Luk , Imperial College London, London
George A. Constantinides , Imperial College London, London
ABSTRACT
Utility-directed transformations involve changing a design to optimize for given constraints while preserving behavior. These changes are often achieved by techniques such as linear programming or geometric programming. We present a systematic approach composing multiple utility-directed transformations for optimizing and mapping a sequential design onto a customizable parallel computing platform such as a Field-Programmable Gate Array (FPGA). Our aim is to enable automatic design optimization at compile time. Design goals specified by users drive the design transformations. Each utility-directed transformation achieves part of the overall goal, and multiple utility-directed transformations, connected by pattern-directed transformations, are composed to fulfill the overall design requirements. The utility-directed transformations in this work produce performance-optimized designs by exploiting data reuse, MapReduce, and pipelining for the target parallel computing platform. Moreover, it is shown that performing transformations in different orders allows users to trade speed for resources, and design performance for compile time. Several applications are used to evaluate this approach on FPGAs. The system performance of a 64-bit matrix multiplication is shown to improve up to 98 times compared to the original design, in the target hardware platform.
INDEX TERMS
Energy efficiency, Pipeline processing, Energy management, Electricity supply industry, Design optimization, Optimization, Geometric programming, geometric programming, Design optimization, data reuse, MapReduce, pipelining
CITATION
Qiang Liu, Tim Todman, Wayne Luk, George A. Constantinides, "Optimizing Hardware Design by Composing Utility-Directed Transformations", IEEE Transactions on Computers, vol.61, no. 12, pp. 1800-1812, Dec. 2012, doi:10.1109/TC.2011.205
REFERENCES
[1] J.O. Kephart and R. Das, "Achieving Self-Management via Utility Functions," IEEE Internet Computing, vol. 11, no. 1, pp. 40-48, Jan./ Feb. 2007.
[2] A. Armonas and L. Nemuraite, "Pattern Based Generation of Full-Fledged Relational Schemas from UML/OCL Models," Information Technology and Control, vol. 35, no. 1, pp. 27-33, 2006.
[3] B. di Martino, N. Mazzoca, G.P. Saggese, and A.G.M. Strollo, "A Technique for FPGA Synthesis Driven by Automatic Source Code Synthesis and Transformations," Proc. Int'l Conf. Field-Programmable Logic and Applications (FPL), 2002.
[4] M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, S.-W. Liao, E. Bugnion, and M.S. Lam, "Maximizing Multiprocessor Performance with the SUIF Compiler," Computer, vol. 29, no. 12, pp. 84-89, Dec. 1996.
[5] ACE, "CoSy Compilers: Overview of Construction and Operation," http://www.ace.nl/compilerpaper-construct.pdf , 2011.
[6] L. Renganarayana and S. Rajopadhye, "A Geometric Programming Framework for Optimal Multi-Level Tiling," Proc. ACM/IEEE Conf. Supercomputing, p. 18, 2004.
[7] Q. Liu, G.A. Constantinides, K. Masselos, and P.Y.K. Cheung, "Combining Data Reuse with Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 3, pp. 305-315, Mar. 2009.
[8] K. Turkington, G.A. Constantinides, K. Masselos, and P.Y.K. Cheung, "Outer Loop Pipelining for Application Specific Datapaths in FPGAs," IEEE Trans. Very Large Scale Integration Systems, vol. 16, no. 10, pp. 1268-1280, Oct. 2008.
[9] Y. Lam, J. Coutinho, W. Luk, and P. Leong, "Optimising Multi-Loop Programs for Heterogeneous Computing Systems," Proc. Southern Programmable Logic Conf., pp. 129-134, 2009.
[10] T. Todman, J.G.d.F. Coutinho, and W. Luk, "Customisable Hardware Compilation," The J. Supercomputing, vol. 32, no. 2, pp. 119-137, 2005.
[11] "The TXL Programming Language," http:/www.txl.ca/, Oct. 2009.
[12] M. Boekhold, I. Karkowski, H. Corporaal, and A. Cilio, "A Programmable ANSI C Transformation Engine," Proc. Eighth Int'l Conf. Compiler Construction, pp. 292-295, 1999.
[13] S. Derrien and P. Quinton, "Parallelizing HMMER for Hardware Acceleration on FPGAs," Proc. IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP), pp. 10-17, July 2007.
[14] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, "SPARK: A High-Level Synthesis Framework for Applying Parallelizing Compiler Transformations," Proc. Int'l Conf. VLSI Design, pp. 461-466, Jan. 2003.
[15] Z. Guo, B. Buyukkurt, and W. Najjar, "Input Data Reuse in Compiling Window Operations onto Reconfigurable Hardware," Proc. ACM SIGPLAN/SIGBED Conf. Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 249-256, 2004.
[16] B. So, M.W. Hall, and P.C. Diniz, "A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-Based Systems," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 165-176, 2002.
[17] A.P. Chandrakasan, M. Potkonjak, J. Rabaey, and R.W. Brodersen, "HYPER-LP: A System for Power Minimization Using Architectural Transformations," Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, pp. 300-303, 1992.
[18] A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. Anderson, S. Brown, and T. Czajkowski, "LegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems," Proc. ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays (FPGA), pp. 33-36, 2011.
[19] "Introducing AccelDSP Synthesis," http://www.xilinx.com/ support/documentation/ sw_manualsacceldsp_user.pdf, May 2008.
[20] http://www.mentor.com/products/c-based_design/ catapult_c_ synthesisindex.cfm, Oct. 2008.
[21] http://www.synfora.com/productspicoexpress.html , Oct. 2005.
[22] http://www.impulsec.comC_to_fpga_overview.htm , Oct. 2005.
[23] http:/www.agilityds.com, May 2008.
[24] http://www.forteds.com/productscynthesizer_datasheet.pdf , May 2008.
[25] http:/www.mentor.com, Jan. 2010.
[26] "Nios II C2H Compiler User Guide," http://www.altera.com/literature/ugug_nios2_c2h_compiler.pdf , May 2008.
[27] http:/www.autoesl.com, Jan. 2011.
[28] Q. Liu, T. Todman, J.G. de F. Coutinho, W. Luk, and G.A. Constantinides, "Optimising Designs by Combining Model-Based and Pattern-Based Transformations," Proc. Int'l Conf. Field-Programmable Logic and Applications (FPL), pp. 308-313, 2009.
[29] Q. Liu, T. Todman, W. Luk, and G.A. Constantinides, "Automatic Optimisation of MapReduce Designs by Geometric Programming," Proc. Int'l Conf. Field-Programmable Technology (FPT), pp. 215-222, 2009.
[30] Q. Liu, K. Masselos, and G.A. Constantinides, "Data Reuse Exploration for FPGA Based Platforms Applied to the Full Search Motion Estimation Algorithm," Proc. Int'l Conf. Field-Programmable Logic and Applications (FPL), pp. 389-394, 2006.
[31] Q. Liu, G.A. Constantinides, K. Masselos, and P.Y.K. Cheung, "Automatic On-Chip Memory Minimization for Data Reuse," Proc. Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), pp. 251-260, 2007.
[32] N. Baradaran and P.C. Diniz, "A Compiler Approach to Managing Storage and Memory Bandwidth in Configurable Architectures," ACM Trans. Design Automation of Electronic Systems, vol. 13, no. 4, pp. 1-26, 2008.
[33] H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G.R. Gao, "Single-Dimension Software Pipelining for Multi-Dimensional Loops," Proc. IEEE Int'l Symp. Code Generation and Optimization (CGO), pp. 163-174, 2004.
[34] J.H. Yeung, C. Tsang, K. Tsoi, B.S. Kwan, C.C. Cheung, A.P. Chan, and P.H. Leong, "Map-Reduce as a Programming Model for Custom Computing Machines," Proc. Int'l Symp. Field-Programmable Custom Computing Machines (FCCM), pp. 149-159, 2008.
[35] J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proc. Sixth Conf. Symp. Operating Systems Design and Implementation (OSDI), pp. 137-150, Dec. 2004.
[36] T. Todman, Q. Liu, W. Luk, and G. Constantinides, "A Scripting Engine for Combining Design Transformations," Proc. IEEE Int'l Symp. Field-Programmable Custom Computing Machines (FCCM), pp. 255-258, 2010.
[37] T. Todman, Q. Liu, W. Luk, and G. Constantinides, "Customizable Composition and Parameterization of Hardware Design Transformations," Proc. 13th Euromicro Conf. Digital System Design: Architectures, Methods and Tools (DSD), pp. 595-602, 2010.
[38] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[39] Q. Liu, T. Todman, and W. Luk, "Combining Optimizations in Automated Low Power Design," Proc. Design, Automation and Test in Europe Conf., pp. 1791-1796, 2010.
[40] K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, and K.A. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley," Technical Report UCB/EECS-2006-183, EECS Dept., Univ. of California, Berkeley, 2006.
[41] U.K. Banerjee, Loop Parallelization. Kluwer Academic, 1994.
[42] J. Löfberg, "YALMIP: A Toolbox for Modeling and Optimization in MATLAB," Proc. IEEE Int'l Symp. Computer Aided Control Systems Design (CACSD), 2004.
[43] L. Merritt and R. Vanam, "Improved Rate Control and Motion Estimation for H.264 Encoder," Proc. IEEE Int'l Conf. Image Processing (ICIP), pp. 309-312, 2007.
[44] http://www.pages.drexel.edu/~weg22edge.html , 2006.
[45] N. Dave, K. Fleming, M. King, M. Pellauer, and M. Vijayaraghavan, "Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA," Proc. Int'l Conf. Formal Methods and Models for Codesign, pp. 97-100, 2007.
[46] I. Sotiropoulos and I. Papaefstathiou, "A Fast Parallel Matrix Multiplication Reconfigurable Unit Utilized in Face Recognitions Systems," Proc. Int'l Conf. Field-Programmable Logic and Applications (FPL), pp. 276-281, 2009.
[47] M. Koester, W. Luk, J. Hagemeyer, M. Porrmann, and U. Ruckert, "Design Optimizations for Tiled Partially Reconfigurable Systems," IEEE Trans. Very Large Scale Integration Systems, vol. 19, no. 6, pp. 1048-1061, June 2011.
[48] Q. Liu, T. Mak, J. Luo, W. Luk, and A. Yakovlev, "Power Adaptive Computing System Design in Energy Harvesting Environment," Proc. Int'l Conf. Embedded Computer Systems (SAMOS), pp. 33-40, 2011.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool