This Article 
 Bibliographic References 
 Add to: 
Exploiting Program Branch Probabilities in Hardware Compilation
November 2004 (vol. 53 no. 11)
pp. 1408-1419
This paper explores using information about program branch probabilities to optimize the results of hardware compilation. The basic premise is to promote utilization by dedicating more resources to branches which execute more frequently. A new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities. We propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given a set of observed branch probabilities. An experimental hardware compilation system has been developed to evaluate this approach. The branch optimization design space is characterized in an experimental study for Xilinx Virtex FPGAs of two complex applications: video feature extraction and progressive refinement radiosity. For designs of equal performance, branch-optimized designs require 24 percent and 27.5 percent less area. For designs of equal area, branch optimized designs run up to three times faster. Our analytical performance model is shown to be highly accurate with relative error between 0.12 and 1.1\times10^{-4}.

[1] Elixent d-fabrix and rap, , 2003
[2] Celoxica Limited, Handel-C Language Reference Manual, version 3.1, document number RM-1003-3.0, 2002.
[3] A Look into Quicksilver's Acm Architecture , 2002
[4] M. Budiu and S.C. Goldstein, Compiling Application-Specific Hardware Field-Programmable Logic and Applications, pp. 853-863, Springer, 2002.
[5] W. Luk, N. Shirazi, and P.Y.K. Cheung, Modelling and Optimising Run-Time Reconfigurable Systems Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1996.
[6] D. Cottet, R. Enzler, T. Jeger, and G. Tröster, High-Level Area and Performance Estimation of Hardware Building Blocks on FPGAs Field Programmable Logic and Applications, pp. 525-534, Springer, 2000.
[7] P.J. Denning, The Working Set Model for Program Behavior Proc. ACM Symp. Operating System Principles, pp. 15.1-15.12, 1967.
[8] G. De Micheli, Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994.
[9] A.S. Dhodapkar and J.E. Smith, Managing Multi-Configuration Hardware via Dynamic Working Set Analysis Proc. 29th IEEE/ACM Int'l Symp. Computer Architecture, pp. 233-244, 2002.
[10] A.J. Field and P.G. Harrison, A Methodology for the Performance Modelling of Distributed Cache Coherent Multiprocessors The State-of-the-Art in Performance Modeling and Simulation, K. Bagchi, J. Walrand, and G.W. Zobrist, eds., pp. 55-92, Gordon and Breach, 1998.
[11] M.J. Flynn, Computer Architecture: Pipelined and Parallel Processor Design. Sudbury, Mass.: Jones and Bartlett, 1995.
[12] T. Harriss, R. Walke, B. Kienhuis, and E. Deprettere, Compilation from Matlab to Process Networks Design Automation for Embedded Systems, vol. 7, pp. 385-403, 2002.
[13] P.A. Jackson, J.L. Tripp, and B.L. Hutchings, Sea Cucumber: A Synthesizing Compiler for FPGAs Field Programmable Logic and Applications, pp. 875-885, Springer, 2002.
[14] D. Cottet, T. Jeger, R. Enzler, and G. Tröster, The Performance Prediction Model technical report, Electronics Lab, Swiss Federal Inst. of Technology (ETH) Zurich, 2000.
[15] L. Kleinrock, Queueing Systems. Volume I: Theory. John Wiley&Sons, Inc., 1975.
[16] L. Kleinrock, Queueing Systems. Volume II: Computer Applications. John Wiley&Sons, Inc., 1976.
[17] M.S. Lam, R.P. Wilson, R.S. French, and J.L. Hennessy, Suif: An Infrastructure for Research on Parallelizing And Optimizing Compilers SIGPLAN Notices, vol. 29, no. 12, pp. 31-37, 1994.
[18] R. Leupers and P. Marwedel, Retargetable Compiler Technology for Embedded Systems: Tools and Applications, Kluwer Academic, 2001.
[19] W. Luk, T.K. Lee, J.R. Rice, N. Shirazi, and P.Y.K. Cheung, Reconfigurable Computing for Augmented Reality Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1999.
[20] X.-J. Zhang, K.-W. Ng, and W. Luk, A Combined Approach to High-Level Synthesis for Dynamically Reconfigurable Systems Field Programmable Logic and Applications, pp. 361-370, Springer, 2000.
[21] S. McMillan and S. Guccione, Partial Run-Time Reconfiguration Using jrtr Field Programmable Logic and Applications, Springer, 2000.
[22] K.N. McNall and A.E. Casavant, Automatic Operator Configuration in the Synthesis of Pipelined Architectures Proc. 27th ACM/IEEE Design Automation Conf., pp. 174-179, 1990.
[23] O. Mencer, H Hübert, M. Morf, and M.J. Flynn, StReAm: Object-Oriented Programming of Stream Architectures Using PAM-Blox Field-Programmable Logic: the Roadmap to Reconfigurable Systems, pp. 595-604, Springer, 2000.
[24] I. Mitrani, Probabalistic Modelling. Cambridge Univ. Press, 1998.
[25] T. Moller and B. Trumbore, Fast, Minimum Storage Ray-Triangle Intersection J. Graphics Tools, vol. 2, no. 1, pp. 21-28, 1997.
[26] S. Ong et al., Automatic Mapping of Multiple Applications to Multiple Adaptive Computing Systems Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 218-227, 2001.
[27] U. Prabhu and B.M. Pangrle, Global Mobility Based Scheduling Proc. IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors, pp. 370-373, 1993.
[28] R.O. Onvural, Survey of Closed Queueing Networks with Blocking ACM Computing Surveys, vol. 22, no. 2, pp. 83-121, June 1990.
[29] J. Park, K.R. Shesha Shayee, and P.C. Diniz, Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations Field Programmable Logic and Applications, Springer, 2003.
[30] H. Schmit, Incremental Reconfiguration for Pipeline Applications Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1997.
[31] S. Singh and P. James-Roxby, Lava and jbits: From Hdl to Bitstream in Seconds Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2001.
[32] H. Styles and W. Luk, Accelerating Radiosity Calculations Using Reconfigurable Platforms Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 279-281, 2002.
[33] A.H. Veen, Dataflow Machine Architecture ACM Computing Surveys, vol. 18, no. 4, pp. 365-396, 1986.
[34] M. Weinhardt and W. Luk, Pipeline Vectorisation IEEE Trans. Computer-Aided Design, vol. 20, no. 2, pp. 234-248, Feb. 2001.
[35] J.M. Stone et al., Stream-Oriented Fpga Computing in the Stream-C High Level Language Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2000.
[36] W. Wang, A. Raghunathan, N.K. Jha, and S. Dey, High-Level Synthesis of Multi-Process Behavioral Descriptions Proc. IEEE Int'l Conf. VLSI Design, pp. 467-473, 2003.
[37] H. Ziegler, B. So, M. Hall, and P.C. Diniz, Coarse-Grain Pipelining on Multiple FPGA Architectures Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 77-86, 2002.

Index Terms:
Automatic synthesis, dataflow architectures, queuing theory.
Henry Styles, Wayne Luk, "Exploiting Program Branch Probabilities in Hardware Compilation," IEEE Transactions on Computers, vol. 53, no. 11, pp. 1408-1419, Nov. 2004, doi:10.1109/TC.2004.96
Usage of this product signifies your acceptance of the Terms of Use.