The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2008 vol.57)
pp: 686-701
ABSTRACT
This paper examines the hardware implementation tradeoffs when evaluating functions via piecewise polynomial approximations and interpolations for precisions up to 24 bits. In polynomial approximations, polynomials are evaluated using stored coefficients. Polynomial interpolations, however, require the coefficients to be computed on-the-fly using stored function values. Although it is known that interpolations require less memory than approximations at the expense of additional computation, the tradeoffs in memory, area, delay, and power consumption between the two approaches have not been examined in detail. This work quantitatively analyzes these tradeoffs for optimized approximations and interpolations across different functions and target precisions. Hardware architectures for degree-1 and degree-2 approximations and interpolations are described. The results show that the extent of memory savings realized by using interpolation is significantly lower than what is commonly believed. Furthermore, experimental results on a field-programmable gate array (FPGA) show that for high output precision, degree-1 interpolations offer considerable area and power savings over degree-1 approximations, but similar savings are not realized when degree-2 interpolations and approximations are compared. The availability of both interpolation-based and approximation-based designs offers a richer set of design tradeoffs than is available using either interpolation or approximation alone.
INDEX TERMS
Algorithms implemented in hardware, Approximation, Interpolation, VLSI Systems
CITATION
Dong-U Lee, Ray Cheung, Wayne Luk, John Villasenor, "Hardware Implementation Trade-Offs of Polynomial Approximations and Interpolations", IEEE Transactions on Computers, vol.57, no. 5, pp. 686-701, May 2008, doi:10.1109/TC.2007.70847
REFERENCES
[1] Y. Song and B. Kim, “Quadrature Direct Digital Frequency Synthesizers Using Interpolation-Based Angle Rotation,” IEEE Trans. VLSI Systems, vol. 12, no. 7, pp. 701-710, 2004.
[2] H. Shin, J. Lee, and J. Kim, “A Hardware Cost Minimized Fast Phong Shader,” IEEE Trans. VLSI Systems, vol. 9, no. 2, pp. 297-304, 2001.
[3] K. Karagianni, V. Paliouras, G. Diamantakos, and T. Stouraitis, “Operation-Saving VLSI Architectures for 3D Geometrical Transformations,” IEEE Trans. Computers, vol. 50, no. 6, pp. 609-622, June 2001.
[4] P. Liu and S. Bhatt, “Experiences with Parallel N-Body Simulation,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1306-1323, Dec. 2000.
[5] Y. Hu, “CORDIC-Based VLSI Architectures for Digital Signal Processing,” IEEE Signal Processing Magazine, vol. 9, no. 3, pp. 17-34, 1992.
[6] K. Johansson, O. Gustafsson, and L. Wanhammar, “Approximation of Elementary Functions Using a Weighted Sum of Bit-Products,” Proc. IEEE Int'l Symp. Circuits and Systems, pp. 795-798, 2006.
[7] J.-M. Muller, Elementary Functions: Algorithms and Implementation, second ed. Birkhauser, 2006.
[8] F. de Dinechin and A. Tisserand, “Multipartite Table Methods,” IEEE Trans. Computers, vol. 54, no. 5, pp. 319-330, May 2005.
[9] Numerical Analysis, Encyclopedia Britannica Online, http://www.search.eb.com/ebarticle-235500 , 2006.
[10] M. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann, 2004.
[11] D. Lewis, “Interleaved Memory Function Interpolators with Application to an Accurate LNS Arithmetic Unit,” IEEE Trans. Computers, vol. 43, no. 8, pp. 974-982, Aug. 1994.
[12] J. Cao, B. Wei, and J. Cheng, “High-Performance Architectures for Elementary Function Generation,” Proc. 15th IEEE Symp. Computer Arithmetic, pp. 136-144, 2001.
[13] D. Lee, A. Abdul Gaffar, O. Mencer, and W. Luk, “Optimizing Hardware Function Evaluation,” IEEE Trans. Computers, vol. 54, no. 12, pp. 1520-1531, Dec. 2005.
[14] R. Michard, A. Tisserand, and N. Veyrat-Charvillon, “Small FPGA Polynomial Approximations with 3-Bit Coefficients and Low-Precision Estimations of the Powers of X,” Proc. 16th IEEE Int'l Conf. Application-Specific Systems, Architecture and Processors, pp.334-339, 2005.
[15] A. Noetzel, “An Interpolating Memory Unit for Function Evaluation: Analysis and Design,” IEEE Trans. Computers, vol. 38, no. 3, pp. 377-384, Mar. 1989.
[16] N. Takagi, “Powering by a Table Look-Up and a Multiplication with Operand Modification,” IEEE Trans. Computers, vol. 47, no. 11, pp. 1216-1222, Nov. 1998.
[17] M. Arnold and M. Winkel, “A Single-Multiplier Quadratic Interpolator for LNS Arithmetic,” Proc. 19th IEEE Int'l Conf. Computer Design, pp. 178-183, 2001.
[18] J. Detrey and F. de Dinechin, “Table-Based Polynomials for Fast Hardware Function Evaluation,” Proc. 16th IEEE Int'l Conf. Application-Specific Systems, Architecture and Processors, pp. 328-333, 2005.
[19] J. Piñeiro, S. Oberman, J. Muller, and J. Bruguera, “High-Speed Function Approximation Using a Minimax Quadratic Interpolator,” IEEE Trans. Computers, vol. 54, no. 3, pp. 304-318, Mar. 2005.
[20] M. Schulte and E. Swartzlander Jr., “Hardware Designs for Exactly Rounded Elementary Functions,” IEEE Trans. Computers, vol. 43, no. 8, pp. 964-973, Aug. 1994.
[21] E.G. Walters III and M. Schulte, “Efficient Function Approximation Using Truncated Multipliers and Squarers,” Proc. 17th IEEE Symp. Computer Arithmetic, pp. 232-239, 2005.
[22] H. Aus and G. Korn, “Table-Lookup/Interpolation Function Generation for Fixed-Point Digital Computations,” IEEE Trans. Computers, vol. 18, pp. 745-749, 1969.
[23] V. Paliouras, K. Karagianni, and T. Stouraitis, “A Floating-Point Processor for Fast and Accurate Sine/Cosine Evaluation,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 5, pp. 441-451, 2000.
[24] J. McCollum, J. Lancaster, D. Bouldin, and G. Peterson, “Hardware Acceleration of Pseudo-Random Number Generation for Simulation Applications,” Proc. 35th IEEE Southeastern Symp. System Theory, pp. 299-303, 2003.
[25] P. Lamarche and Y. Savaria, “VHDL Source Code Generator and Analysis Tool to Design Linear Interpolators,” Proc. First IEEE Northeast Workshop Circuits and Systems, pp. 69-72, 2003.
[26] J. Rice, The Approximation of Functions, vol. 1. Addison-Wesley, 1964.
[27] J. Mathews, Numerical Methods for Mathematics, Science, and Engineering. Prentice Hall, 1992.
[28] D. Lee, W. Luk, J. Villasenor, and P. Cheung, “Hierarchical Segmentation Schemes for Function Evaluation,” Proc. IEEE Int'l Conf. Field-Programmable Technology, pp. 92-99, 2003.
[29] D. Lee and J. Villasenor, “A Bit-Width Optimization Methodology for Polynomial-Based Function Evaluation,” IEEE Trans. Computers, vol. 56, no. 4, pp. 567-571, Apr. 2007.
[30] R. Michard, A. Tisserand, and N. Veyrat-Charvillon, “Optimisation d'Opérateurs Arithmétiques Matériels à Base d'Approximations Polynomiales,” Proc. Symp. Architectures Nouvelles de Machine, pp. 130-141, 2006.
[31] C. Maxfield, The Design Warrior's Guide to FPGAs. Newnes, 2004.
[32] Using Look-Up Tables as Distributed RAM in Spartan-3 Generation FPGAs (Xilinx Application Note: XAPP464). Xilinx, http://www. xilinx.com/bvdocs/appnotesxapp464.pdf , 2005.
[33] Design Tips for HDL Implementation of Arithmetic Functions: Xilinx Application Note: XAPP215, Xilinx, http://www.xilinx.com/bvdocs/appnotesxapp215.pdf , 2000.
[34] Variable Parallel Virtex Multiplier V2.0: Xilinx Logicore Product Specification. Xilinx, http:/www.xilinx.com, 2000.
[35] Xilinx Univ. Program Virtex-II Pro Development System: Hardware Reference Manual, Xilinx, http://www.xilinx.com/univxupv2p.html, 2005.
[36] V. Jain, S. Wadekar, and L. Lin, “A Universal Nonlinear Component and Its Application to WSI,” IEEE Trans. Components, Hybrids, and Manufacturing Technology, vol. 16, no. 7, pp. 656-664, 1993.
[37] P. L'Ecuyer, “Maximally Equidistributed Combined Tausworthe Generators,” Math. Computation, vol. 65, no. 213, pp. 203-213, 1996.
[38] “Power Consumption in 65 nm FPGAs,” Xilinx White Paper WP246, Xilinx, http://www.xilinx.com/bvdocs/whitepapers wp246.pdf, 2006.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool