This Article 
 Bibliographic References 
 Add to: 
High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root
December 2002 (vol. 51 no. 12)
pp. 1377-1388

Abstract—A new method for the high-speed computation of double-precision floating-point reciprocal, division, square root, and inverse square root operations is presented in this paper. This method employs a second-degree minimax polynomial approximation to obtain an accurate initial estimate of the reciprocal and the inverse square root values, and then performs a modified Goldschmidt iteration. The high accuracy of the initial approximation allows us to obtain double-precision results by computing a single Goldschmidt iteration, significantly reducing the latency of the algorithm. Two unfolded architectures are proposed: the first one computing only reciprocal and division operations, and the second one also including the computation of square root and inverse square root. The execution times and area costs for both architectures are estimated, and a comparison with other multiplicative-based methods is presented. The results of this comparison show the achievement of a lower latency than these methods, with similar hardware requirements.

[1] J. Cao and B. Wei, “High-Performance Hardware for Function Generation,” Proc. 13th Symp. Computer Arithmetic, pp. 184-188, 1997.
[2] D. DasSarma and D.W. Matula, “Faithful Bipartite ROM Reciprocal Tables,” Proc. 12th Symp. Computer Arithmetic, pp. 17-28, 1995.
[3] M.D. Ercegovac, L. Imbert, D.W. Matula, J.M. Muller, and G. Wei, “Improving Goldschmidt Division, Square Root and SquareRoot Reciprocal,” IEEE Trans. Computers, vol. 49, no. 7, pp. 759-763, July 2000.
[4] M.D. Ercegovac and T. Lang, Division and Square Root—Digit-Recurrence Algorithms and Implementations. Kluwer Academic, 1994.
[5] M.D. Ercegovac, T. Lang, J.M. Muller, and A. Tisserand, “Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers,” IEEE Trans. Computers, vol. 49, no. 7, pp. 628-637, July 2000.
[6] P.M. Farmwald, “High Bandwidth Evaluation of Elementary Functions,” Proc. Fifth IEEE Symp. Computer Arithmetic, pp. 139-142, 1981.
[7] M.J. Flynn, “On Division by Functional Iteration,” IEEE Trans. Computers, vol. 19, pp. 702-706, 1970.
[8] D. Harris, S. Oberman, and M. Horowitz, “SRT Division Architectures and Implementations,” Proc. IEEE 13th Int'l Symp. Computer Arithmetic (ARITH13), pp. 18-25, 1997.
[9] M. Ito, N. Takagi, and S. Yajima, “Efficient Initial Approximation and Fast Converging Methods for Division and Square Root,” Proc. 12th Symp. Computer Arithmetic (ARITH12), pp. 2-9, 1995.
[10] V.K. Jain, S.A. Wadecar, and L. Lin, “A Universal Nonlinear Component and its Application to WSI,” IEEE Trans. Components, Hybrids, and Manufacturing Technology, vol. 16, no. 7, pp. 656-664, 1993.
[11] I. Koren, “Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations,” IEEE Trans. Computers, vol. 39, pp. 1030-1037, 1990.
[12] H. Kwan, R.L. Nelson, and E.E. Swartzlander Jr., “Cascaded Implementation of an Iterative Inverse-Square Root Algorithm with Overflow Lookahead,” Proc. 12th Symp. Computer Arithmetic, pp. 114-123, 1995.
[13] T. Lang and P. Montuschi, “Very-High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit,” IEEE Trans. Computers, vol. 48, no. 8, pp. 827-841, Aug. 1999.
[14] C.N. Lyu and D.W. Matula, “Redundant Binary Booth Recoding,” Proc. 12th Symp. Computer Arithmetic, pp. 50-57, 1995.
[15] J.M. Muller, Elementary Functions. Algorithms and Implementation. Birkhauser, 1997.
[16] S. Oberman and M.J. Flynn, “Implementing Division and Other Floating Point Operations: A System Perspective,” Scientific Computing and Validated Numerics, pp. 18-24, 1996.
[17] S.F. Oberman, “Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor,” Proc. 14th Symp. Computer Arithmetic (ARITH14), pp. 106-115, Apr. 1999.
[18] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other Floating Point Operations,” IEEE Trans. Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997.
[19] J.A. Piñeiro and J.D. Bruguera, “High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root,” technical report, 1995/http:/, 2001.
[20] J.A. Piñeiro, J.D. Bruguera, and J.M. Muller, “Faithful Powering Computation Using Table Look-Up and a Fused Accumulation Tree,” Proc. IEEE 15th Int'l Symp. Computer Arithmetic (ARITH15), pp. 40-47, 2001.
[21] M.J. Schulte and J.E. Stine, “Symmetric Bipartite Tables for Accurate Function Approximation” Proc. 13th Symp. Computer Arithmetic (ARITH13), pp. 175-183, 1997.
[22] P. Soderquist and M. Leeser, “Area and Performance Tradeoffs in Floating Point Divide and Square Root Implementations,” ACM Computer Surveys, pp. 518-564, 1996.
[23] N. Takagi, “Powering by a Table Look-Up and a Multiplication with Operand Modification,” IEEE Trans. Computers, vol. 47, no. 11, pp. 1216-1222, Nov. 1998.
[24] P.T.P. Tang, “Table Look-Up Algorithms for Elementary Functions and their Error Analysis,” Argonne Nat'l Laboratory Report, MCS-P194-1190, Jan. 1991.
[25] Waterloo Maple Inc., Maple V Programming Guide, 1998.
[26] W.F. Wong and E. Goto, “Fast Hardware-Based Algorithms for Elementary Function Computations,” IEEE Trans. Computers, vol. 43, no. 3, pp. 278-294, Mar. 1994.

Index Terms:
Computer arithmetic, Goldschmidt iteration, table-based methods, double-precision operations, division, square root, inverse square root.
José-Alejandro Piñeiro, Javier Díaz Bruguera, "High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root," IEEE Transactions on Computers, vol. 51, no. 12, pp. 1377-1388, Dec. 2002, doi:10.1109/TC.2002.1146704
Usage of this product signifies your acceptance of the Terms of Use.