The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2009 vol.58)
pp: 1
Jung Sub Kim , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Lanping Deng , Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ, USA
P. Mangalagiri , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
K. Irick , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
K. Sobti , Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ, USA
M. Kandemir , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
V. Narayanan , Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
C. Chakrabarti , Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ, USA
N. Pitsianis , Dept. of Comput. Sci., Duke Univ., Durham, NC, USA
Xiaobai Sun , Dept. of Comput. Sci., Duke Univ., Durham, NC, USA
ABSTRACT
This paper describes TANOR, an automated framework for designing hardware accelerators for numerical computation on reconfigurable platforms. Applications utilizing numerical algorithms on large-size data sets require high-throughput computation platforms. The focus is on N-body interaction problems which have a wide range of applications spanning from astrophysics to molecular dynamics. The TANOR design flow starts with a MATLAB description of a particular interaction function, its parameters, and certain architectural constraints specified through a graphical user interface. Subsequently, TANOR automatically generates a configuration bitstream for a target FPGA along with associated drivers and control software necessary to direct the application from a host PC. Architectural exploration is facilitated through support for fully custom fixed-point and floating-point representations in addition to standard number representations such as single-precision floating point. Moreover, TANOR enables joint exploration of algorithmic and architectural variations in realizing efficient hardware accelerators. TANOR's capabilities have been demonstrated for three different N-body interaction applications: the calculation of gravitational potential in astrophysics, the diffusion or convolution with Gaussian kernel common in image processing applications, and the force calculation with vector-valued kernel function in molecular dynamics simulation. Experimental results show that TANOR-generated hardware accelerators achieve lower resource utilization without compromising numerical accuracy, in comparison to other existing custom accelerators.
INDEX TERMS
Hardware, Algorithm design and analysis, MATLAB, Field programmable gate arrays, Accuracy, Numerical analysis,numerical algorithms., Algorithms implemented in hardware, reconfigurable hardware, signal processing systems
CITATION
Jung Sub Kim, Lanping Deng, P. Mangalagiri, K. Irick, K. Sobti, M. Kandemir, V. Narayanan, C. Chakrabarti, N. Pitsianis, Xiaobai Sun, "An Automated Framework for Accelerating Numerical Algorithms on Reconfigurable Platforms Using Algorithmic/Architectural Optimization", IEEE Transactions on Computers, vol.58, no. 12, pp. 1, December 2009, doi:10.1109/TC.2009.78
REFERENCES
[1] Y. Atat and N.-E. Zergainoh, “Simulink-Based MPSoC Design: New Approach to Bridge the Gap between Algorithm and Architecture Design,” Proc. IEEE CS Ann. Symp. VLSI, pp. 9-14, 2007.
[2] D. Soderman and Y. Panchul, “Implementing C Designs in Hardware: A Full-Featured ANSI C to RTL Verilog Compiler in Action,” Proc. Int'l Verilog HDL Conf. and VHDL Int'l Users Forum, pp. 22-29, 1998.
[3] P. Banerjee et al., “Overview of a Compiler for Synthesizing MATLAB Programs onto Fpgas,” IEEE Trans. VLSI Systems, vol. 12, no. 3, pp. 312-323, Mar. 2004.
[4] MATLAB, “The MATLAB Website,” http:/www.mathworks. com, 2007.
[5] J.S. Kim et al., “TANOR: A Tool for Accelerating N-Body Simulations on Reconfigurable Platform,” Proc. Int'l Conf. Field Programmable Logic and Applications (FPL '07), pp. 68-73, Aug. 2007.
[6] N.-E. Zergainoh, K. Popovici, A. Jerraya, and P. Urard, “IP-Block-Based Design Environment for High-Throughput VLSI Dedicated Digital Signal Processing Systems,” Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC '05), pp. 612-618, 2005.
[7] N. Zergainoh, L. Tambour, P. Urard, and A. Jerraya, “Macrocell Builder: IP-Block-Based Design Environment for High-Throughput VLSI Dedicated Digital Signal Processing Systems,” EURASIP J. on Applied Signal Processing, pp. 1-11, 2006.
[8] SysGen, “The Xilinx Web Page,” http://www.xilinx.com/ise/optional_prodsystem_generator.htm , 2007.
[9] Impulsec, “The Impulsec Web Page,” http:/www.impulsec.com, 2007.
[10] Catapultc, “The Mentor Web Page,” http://www.mentor.com/products/esl/high_level_synthesis catapult_synthesis/, 2007.
[11] Cameron, “Cameron: Compiling High-Level Programs to FPGA Configurations,” http://www.cs.colostate.educameron/, 2002.
[12] Calypto, “Calypto's Sequential Analysis Technology,” http:/www.calypto.com/, 2008.
[13] S. Vasudevan, J. Abraham, V. Viswanath, and J. Tu, “Automatic Decomposition for Sequential Equivalence Checking of System Level and RTL Descriptions,” Proc. Int'l Conf. Formal Methods and Models for Co-Design (MEMOCODE '06), pp. 71-80, July 2006.
[14] AccelDSP, “Xilinx AccelDSP Synthesis Tool,” http://www.xilinx. com/ise/dsp_design_prod acceldsp, 2007.
[15] MATCH “MATCH: A MATLAB Compilation Environment for Distributed Heterogeneous Adaptive Computing Systems,” http://www.ece.northwestern.edu/cpdcMatch , 2002.
[16] P. Banerjee, D. Bagchi, M. Haldar, A. Nayak, V. Kim, and R. Uribe, “Automatic Conversion of Floating Point MATLAB Programs into Fixed Point FPGA Based Hardware Design,” Proc. 11th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM '03), pp 263-264, 2003.
[17] J. Reif and S. Tate, “The Complexity of N-Body Simulation,” Proc. 20th Int'l Colloquium on Automata, Languages and Programming (ICALP '93), pp. 162-176, 1993.
[18] J. Makino, “The GRAPE Project,” Computing in Science and Eng., vol. 8, no. 1, pp. 30-40, 2006.
[19] T. Fukushige, M. Taiji, J. Makino, T. Ebisuzaki, and D. Sugimoto, “A Highly Parallelized Special-Purpose Computer for Many-Body Simulations with an Arbitrary Central Force: MD-GRAPE,” Astrophysical J., vol. 468, no. 1, pp. 51-61, Sept. 1996.
[20] R. Susukita, T. Ebisuzaki, B. Elmegreen, H. Furusawa, K. Kato, A. Kawai, Y. Kobayashi, T. Koishi, G. McNiven, T. Narumi, and K. Yasuoka, “Hardware Accelerator for Molecular Dynamics: MDGRAPE-2,” Computer Physics Comm., vol. 155, no. 2, pp. 115-131, 2003.
[21] S. Toyoda, H. Miyagawa, K. Kitamura, T. Amisaki, E. Hashimoto, H. Ikeda, A. Kusumi, and N. Miyakawa, “Development of MD Engine: High-Speed Accelerator with Parallel Processor Design for Molecular Dynamics Simulations,” J. Computational Chemistry, vol. 20, no. 2, pp. 185-199, 1999.
[22] T. Amisaki, S. Toyoda, H. Miyagawa, and K. Kitamura, “Development of Hardware Accelerator for Molecular Dynamics Simulations: A Computation Board that Calculates Nonbonded Interactions in Cooperation with Fast Multipole Method,” J. Computational Chemistry, vol. 24, no. 5, pp. 582-592, 2003.
[23] N. Azizi, I. Kuon, A. Egier, A. Darabiha, and P. Chow, “Reconfigurable Molecular Dynamics Simulator,” Proc. 12th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, pp.197-206, 2004.
[24] Y. Gu, T. VanCourt, and M.C. Herbordt, “Accelerating Molecular Dynamics Simulations with Configurable Circuits,” Proc. IEE Computers and Digital Techniques, vol. 153, no. 3, pp. 189-195, 2006.
[25] T.A. Cook, H.-R. Kim, and L. Louca, “Hardware Acceleration of N-Body Simulations for Galactic Dynamics,” Proc. SPIE, pp. 115-126, 1995.
[26] W. Smith and A. Schnore, “Towards an RCC-Based Accelerator for Computational Fluid Dynamics Applications,” J. Supercomputing, vol. 30, no. 3, pp. 239-261, 2004.
[27] T. Hamada, T. Fukushige, A. Kawai, and J. Makino, “PROGRAPE-1: A Programmable Special-Purpose Computer for Many-Body Simulations,” Proc. IEEE Symp. FPGAs for Custom Computing Machines, pp. 256-257, 1998.
[28] T. Hamada and N. Nakasato, “PGR: A Software Package for Reconfigurable Super-Computing,” Proc. Int'l Conf. Field Programmable Logic and Applications, pp. 366-373, 2005.
[29] D.P.V. Kindratenko, “A Case Study in Porting a Production Scientific Supercomputing Application to a Reconfigurable Computer,” Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, 2006.
[30] R. Scrofano, M. Gokhale, F. Trouw, and V.K. Prasanna, “A Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers,” Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 23-34, 2006.
[31] K. Sobti, L. Deng, C. Chakrabarti, N. Pitsianis, X. Sun, J. Kim, P. Mangalagiri, K. Irick, M. Kandemir, and V. Narayanan, “Efficient Function Evaluations with Lookup Tables for Structured Matrix Operations,” Proc. IEEE Workshop Signal Processing Systems (SiPS '07), pp. 463-468, Oct. 2007.
[32] G. Chen, L. Xue, J. Kim, K. Sobti, L. Deng, X. Sun, N. Pitsianis, C. Chakrabarti, M. Kandemir, and N. Vijaykrishnan, “Using Geometric Tiling for Reducing Power Consumption in Structured Matrix Operations,” Proc. IEEE Int'l System-on-Chip Conf., pp. 113-114, 2006.
[33] R.A. Walker and R. Camposano, A Survey of High-Level Synthesis Systems, first ed. Kluwer Academic, 1991.
[34] L.H. de Figueiredo and J. Stolfi, Self-Validated Numerical Methods and Applications. IMPA/CNPq, 1997.
[35] D. Lee, A. Gaffar, R. Cheung, O. Mencer, W. Luk, and G. Constantinides, “Accuracy Guaranteed Bit-Width Optimization,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 10, pp. 1990-2000, 2006.
[36] K. Sobti, “FANTOM: A Fixed Point Framework for Algorithm Architecture Co-Design,” MS thesis, Arizona State Univ., Tempe, Aug. 2007.
[37] LogicoreAddSub, “The Xilinx Webpage,” www.xilinx.com/ipcenter/catalog/logicore/ docsaddsub.pdf, 2007.
[38] L. Deng, K. Sobti, and C. Chakrabarti, “Accurate Models for Estimating Area and Power of FPGA Implementations,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '08), pp.1417-1420, Apr. 2008.
[39] P.H. Sherrod, “Nonlinear Regression Analysis Program,” http://www.nlreg.comNLREG.pdf, 2002.
[40] PCIe, “Xilinx PCI Express Endpoint LogiCORE,” http://www .xilinx.com/xlnx/xebiz/designResources ip_product_details.jsp? key=DO-DI-PCIEXP , 2007.
[41] Dinigroup, “Dinigroup dn6000k10pcie-4,” http://www. dinigroup.comindex.php?product=DN6000k10pcie , 2007.
[42] R.C. Gonzalez and R.E. Woods, Digital Image Processing, second ed. Prentice Hall, 2002.
[43] Xpower, “The Xilinx Webpage,” http://www.xilinx.com/ products/design_tools/ logic_design/verificationxpower.htm , 2007.
41 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool