The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2013 vol.62)
pp: 858-872
Joshua Yung Lih Low , Nanyang Technological University, Singapore
Ching Chuen Jong , Nanyang Techological University, Singapore
ABSTRACT
The tables-and-additions methods for accurate computation of elementary functions are fast in computation speed but require large memory. A memory-efficient method named as the integrated Add-Table Lookup-Add (iATA) is proposed in this paper. In iATA, the mathematical formulation for computing the elementary functions is derived without using the central difference formulation to save memory. Three additional techniques, specifically the carry select technique, symmetry property exploitation and unequal partitioning of input with the aid of error analysis, are integrated in iATA to further reduce the memory size. The experimental results show that the proposed method is able to achieve higher memory efficiency than the best existing tables-and-additions methods. For the reciprocal and the natural logarithm function, iATA saves 23.63 and 61.39 percent of memory when compared to the best existing results obtained, respectively, by the unified Multipartite Table Method [39] and the Symmetric Table Addition Method [37].
INDEX TERMS
Taylor series, Chebyshev approximation, Memory management, Error analysis, Interpolation, Equations, error analysis, Taylor series, Chebyshev approximation, Memory management, Error analysis, Interpolation, Equations, VLSI, Computer arithmetic, elementary function approximation
CITATION
Joshua Yung Lih Low, Ching Chuen Jong, "A Memory-Efficient Tables-and-Additions Method for Accurate Computation of Elementary Functions", IEEE Transactions on Computers, vol.62, no. 5, pp. 858-872, May 2013, doi:10.1109/TC.2012.43
REFERENCES
[1] D.-U. Lee, R.C.C. Cheung, W. Luk, and J.D. Villasenor, "Hardware Implementation Trade-offs of Polynomial Approximations and Interpolations," IEEE Trans. Computers, vol. 57, no. 5, pp. 686-701, May 2008.
[2] A.G.M. Strollo, D. De Caro, and N. Petra, "A 630 MHz, 76 mW Direct Digital Frequency Synthesizer Using Enhanced ROM Compression Technique," IEEE J. Solid State Circuits, vol. 42, no. 2, pp. 350-360, Feb. 2007.
[3] A. Ashrafi, R. Adhami, and A. Milenkovic, "A Direct Digital Frequency Synthesizer Based on the Quasi-Linear Interpolation Method," IEEE Trans. Circuits and Sytems-I: Regular Papers, vol. 57, no. 4, pp. 863-872, Apr. 2010.
[4] H.-C. Shin, J.-A. Lee, and L.-S. Kim, "A Hardware Cost Minimized Fast Phong Shader," IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 9, no. 2, pp. 297-304, Apr. 2001.
[5] M.-H. Lai, M.-F. Yu, and S.-G. Chen, "An Efficient Modified Phong Shading Algorithm & Its Low-complexity Realization," Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '02), vol. 4, pp. IV-201-IV-204, 2002.
[6] K. Karagianni, V. Paliouras, G. Diamantakos, and T. Stouraitis, "Operation-Saving VLSI Architectures for 3D Geometrical Transformations," IEEE Trans. Computers, vol. 50, no. 6, pp. 609-622, June 2001.
[7] K. Karagianni and T. Stouraitis, "A Vector Processor for 3-D Geometrical Transformations," Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '01), vol. 4, pp. 482-485, 2001.
[8] P. Liu and S.N. Bhatt, "Experiences with Parallel N-body Simulation," IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1306-1323, Dec. 2000.
[9] B.-G. Nam and H.-J. Yoo, "A 28.5mW 2.8GFLOPS Floating-Point Multifunction Unit for Handheld 3D Graphics Processors," Proc. IEEE Asian Solid State Circuits Conf. (ASSCC '07), pp. 376-379, 2007.
[10] J.-H. Woo, J.-H. Sohn, H. Kim, J. Jeong, E. Jeong, S.-J. Lee, and H.-J. Yoo, "A 195mW, 9.1Mvertices/s Fully Programmable 3D Graphics Processor for Low Power Mobile Devices," Proc. IEEE Asian Solid State Circuits Conf. (ASSCC '07), pp. 372-375, 2007.
[11] D. De Caro, N. Petra, and A.G.M. Strollo, "High-Performance Special Function Unit for Programmable 3-D Graphics Processors," IEEE Trans. Circuits and Systems-I: Regular Papers, vol. 56, no. 9, pp. 1968-1978, Sept. 2009.
[12] S.-F. Hsiao, P.-C. Wei, and C.-P. Lin, "An Automatic Hardware Generator for Special Arithmetic Functions Using Various ROM-Based Approximation Approaches," Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '08), pp. 468-471, 2008.
[13] W.F. Wong and E. Goto, "Fast Evaluation of the Elementary Functions in Single Precision," IEEE Trans. Computers, vol. 44, no. 3, pp. 453-457, Mar. 1995.
[14] J.E. Volder, "The CORDIC Trigonometric Computing Technique," IRE Trans. Electronic Computers, vol. EC-8, pp. 330-334, 1959.
[15] T. Lang and E. Antelo, "High-Throughput CORDIC-Based Geometry Operations for 3D Computer Graphics," IEEE Trans. Computers, vol. 54, no. 3, pp. 347-361, Mar. 2005.
[16] M.G.B. Sumanasena, "A Scale Factor Correction Scheme for the CORDIC Algorithm," IEEE Trans. Computers, vol. 57, no. 8, pp. 1148-1152, Aug. 2008.
[17] D. De Caro, N. Petra, and A. Strollo, "Digital Synthesizer/Mixer with Hybrid CORDIC - Multiplier Architecture: Error Analysis and Optimization," IEEE Trans. Circuits and Systems-I: Regular Papers, vol. 56, no. 2, pp. 364-373, Feb. 2009.
[18] T.K. Rodrigues and E.E. Swartzlander, "Adaptive CORDIC: Using Parallel Angle Recoding to Accelerate Rotations," IEEE Trans. Computers, vol. 59, no. 4, pp. 522-531, Apr. 2010.
[19] J.-A. Pineiro, M.D. Ercegovac, and J.D. Bruguera, "Algorithm and Architecture for Logarithm, Exponential, and Powering Computation," IEEE Trans. Computers, vol. 53, no. 9, pp. 1085-1096, Sept. 2004.
[20] G.M. Philips and P.J. Taylor, Theory and Applications of Numerical Analysis. Acedemic Press, 1996.
[21] N. Takagi, "Powering by a Table Look-up and a Multiplication with Operand Modification," IEEE Trans. Computers, vol. 47, no. 11, pp. 1216-1222, Nov. 1998.
[22] J. Detrey and F. de Dinechin, "Table-Based Polynomials for Fast Hardware Function Evaluation," Proc. IEEE Int'l Conf. Application-Specific Systems, Architecture Processors (ASAP '05), pp. 328-333, 2005.
[23] E.G. WaltersIII and M.J. Schulte, "Efficient Function Approximation Using Truncated Multipliers and Squarers," Proc. 17th IEEE Symp. Computer Arithmetic (ARITH '05), pp. 232-239, 2005.
[24] D.-U. Lee, A.A. Gaffar, O. Mencer, and W. Luk, "Optimizing Hardware Function Evaluation," IEEE Trans. Computers, vol. 54, no. 12, pp. 1520-1531, Dec. 2005.
[25] J.-A. Pineiro, S.F. Oberman, J.-M. Muller, and J.D. Burguera, "High-Speed Function Approximation Using a Minimax Quadratic Interpolator," IEEE Trans. Computers, vol. 54, no. 3, pp. 304-318, Mar. 2005.
[26] D.-U. Lee and J.D. Villasenor, "A Bit-Width Optimization Methodology for Polynomial-Based Function Evaluation," IEEE Trans. Computers, vol. 56, no. 4, pp. 567-571, Apr. 2007.
[27] T. Sasao, S. Nagayama, and J.T. Butler, "Numerical Function Generators Using LUT Cascades," IEEE Trans. Computers, vol. 56, no. 6, pp. 826-838, June 2007.
[28] A. Alimohammad, S.F. Fard, and B.F. Cockburn, "A Unified Architecture for the Accurate and High-throughput Implementation of Six Key Elementary Functions," IEEE Trans. Computers, vol. 59, no. 4, pp. 449-456, Apr. 2010.
[29] A.G.M. Strollo, D. De Caro, and N. Petra, "Elementary Functions Hardware Implementation Using Constrained Piecewise-polynomial Approximations," IEEE Trans. Computers, vol. 60, no. 3, pp. 418-432, Mar. 2011.
[30] D. Lewis, "Interleaved Memory Function Interpolators with Application to an Accurate LNS Arithmetic Unit," IEEE Trans. Computers, vol. 43, no. 8, pp. 974-982, Aug. 1994.
[31] V. Paliouras, K. Karagianni, and T. Stouraitis, "A Floating-Point Processor for Fast and Accurate Sine/Cosine Evaluation," IEEE Trans. Circuits and Systems-II: Analog and Digital Signal Processing, vol. 47, no. 5, pp. 441-451, May 2000.
[32] J. Cao, B.W.Y. Wei, and J. Cheng, "High-performance Architectures for Elementary Function Generation," Proc. 15th IEEE Symp. Computer Arithmetic (ARITH '01), pp. 136-144, 2001.
[33] P. Lamarche and Y. Savaria, "VHDL Source Code Generator and Analysis Tool to Design Linear Interpolators," Proc. First IEEE Northeast Workshop Circuits and Systems, pp. 67-72, 2003.
[34] J.M. McCollum, J.M. Lancaster, D.W. Bouldin, and G.D. Peterson, "Hardware Acceleration of Pseudo-Random Number Generation for Simulation Applications," Proc. 35th Southeastern Symp. System Theory (SSST '03), pp. 299-303, 2003.
[35] D. Das Sarma and D.W. Matula, "Faithful Bipartite ROM Reciprocal Tables," Proc. 12th Symp. Computer Arithmetic (ARITH '95), pp. 17-28, 1995.
[36] M.J. Schulte and J.E. Stine, "Approximating Elementary Functions with Symmetric Bipartite Tables," IEEE Trans. Computers, vol. 48, no. 8, pp. 842-847, Aug. 1999.
[37] J.E. Stine and M.J. Schulte, "The Symmetric Table Addition Method for Accurate Function Approximation," J. VLSI Signal Processing Systems for Signal, Image, and Video Technology, vol. 21, pp. 167-177, 1999.
[38] J.M. Muller, "A Few Results on Table-Based Methods," Reliable Computing, vol. 5, pp. 279-288, 1999.
[39] F. de Dinechin and A. Tisserand, "Multipartite Table Methods," IEEE Trans. Computers, vol. 54, no. 3, pp. 319-330, Mar. 2005.
[40] A. Strollo, D. De Caro, and N. Petra, "A 430 MHz, 280 mW Processor for the Conversion of Cartesian to Polar Coordinates in 0.25 um CMOS," IEEE J. Solid State Circuits, vol. 43, no. 11, pp. 2503-2513, Nov. 2008.
[41] J.M. Muller, Elementary Functions: Algorithms and Implementation. Birkhauser, 2006.
[42] M.J. Schutle and J.E. Stine, "Accurate Function Approximations by Symmetric Table Lookup and Addition," Proc. IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '97), pp. 144-153, 1997.
[43] "FPGA Design Tips," http://www.xilinx.com/itp/xilinx7/books/ data/docs/devdev0017_5.html, 2005.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool