
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
JoséAlejandro Piñeiro, Javier Díaz Bruguera, "HighSpeed DoublePrecision Computation of Reciprocal, Division, Square Root and Inverse Square Root," IEEE Transactions on Computers, vol. 51, no. 12, pp. 13771388, December, 2002.  
BibTex  x  
@article{ 10.1109/TC.2002.1146704, author = {JoséAlejandro Piñeiro and Javier Díaz Bruguera}, title = {HighSpeed DoublePrecision Computation of Reciprocal, Division, Square Root and Inverse Square Root}, journal ={IEEE Transactions on Computers}, volume = {51}, number = {12}, issn = {00189340}, year = {2002}, pages = {13771388}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2002.1146704}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  HighSpeed DoublePrecision Computation of Reciprocal, Division, Square Root and Inverse Square Root IS  12 SN  00189340 SP1377 EP1388 EPD  13771388 A1  JoséAlejandro Piñeiro, A1  Javier Díaz Bruguera, PY  2002 KW  Computer arithmetic KW  Goldschmidt iteration KW  tablebased methods KW  doubleprecision operations KW  division KW  square root KW  inverse square root. VL  51 JA  IEEE Transactions on Computers ER   
Abstract—A new method for the highspeed computation of doubleprecision floatingpoint reciprocal, division, square root, and inverse square root operations is presented in this paper. This method employs a seconddegree minimax polynomial approximation to obtain an accurate initial estimate of the reciprocal and the inverse square root values, and then performs a modified Goldschmidt iteration. The high accuracy of the initial approximation allows us to obtain doubleprecision results by computing a single Goldschmidt iteration, significantly reducing the latency of the algorithm. Two unfolded architectures are proposed: the first one computing only reciprocal and division operations, and the second one also including the computation of square root and inverse square root. The execution times and area costs for both architectures are estimated, and a comparison with other multiplicativebased methods is presented. The results of this comparison show the achievement of a lower latency than these methods, with similar hardware requirements.
[1] J. Cao and B. Wei, “HighPerformance Hardware for Function Generation,” Proc. 13th Symp. Computer Arithmetic, pp. 184188, 1997.
[2] D. DasSarma and D.W. Matula, “Faithful Bipartite ROM Reciprocal Tables,” Proc. 12th Symp. Computer Arithmetic, pp. 1728, 1995.
[3] M.D. Ercegovac, L. Imbert, D.W. Matula, J.M. Muller, and G. Wei, “Improving Goldschmidt Division, Square Root and SquareRoot Reciprocal,” IEEE Trans. Computers, vol. 49, no. 7, pp. 759763, July 2000.
[4] M.D. Ercegovac and T. Lang, Division and Square Root—DigitRecurrence Algorithms and Implementations. Kluwer Academic, 1994.
[5] M.D. Ercegovac, T. Lang, J.M. Muller, and A. Tisserand, “Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers,” IEEE Trans. Computers, vol. 49, no. 7, pp. 628637, July 2000.
[6] P.M. Farmwald, “High Bandwidth Evaluation of Elementary Functions,” Proc. Fifth IEEE Symp. Computer Arithmetic, pp. 139142, 1981.
[7] M.J. Flynn, “On Division by Functional Iteration,” IEEE Trans. Computers, vol. 19, pp. 702706, 1970.
[8] D. Harris, S. Oberman, and M. Horowitz, “SRT Division Architectures and Implementations,” Proc. IEEE 13th Int'l Symp. Computer Arithmetic (ARITH13), pp. 1825, 1997.
[9] M. Ito, N. Takagi, and S. Yajima, “Efficient Initial Approximation and Fast Converging Methods for Division and Square Root,” Proc. 12th Symp. Computer Arithmetic (ARITH12), pp. 29, 1995.
[10] V.K. Jain, S.A. Wadecar, and L. Lin, “A Universal Nonlinear Component and its Application to WSI,” IEEE Trans. Components, Hybrids, and Manufacturing Technology, vol. 16, no. 7, pp. 656664, 1993.
[11] I. Koren, “Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations,” IEEE Trans. Computers, vol. 39, pp. 10301037, 1990.
[12] H. Kwan, R.L. Nelson, and E.E. Swartzlander Jr., “Cascaded Implementation of an Iterative InverseSquare Root Algorithm with Overflow Lookahead,” Proc. 12th Symp. Computer Arithmetic, pp. 114123, 1995.
[13] T. Lang and P. Montuschi, “VeryHigh Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit,” IEEE Trans. Computers, vol. 48, no. 8, pp. 827841, Aug. 1999.
[14] C.N. Lyu and D.W. Matula, “Redundant Binary Booth Recoding,” Proc. 12th Symp. Computer Arithmetic, pp. 5057, 1995.
[15] J.M. Muller, Elementary Functions. Algorithms and Implementation. Birkhauser, 1997.
[16] S. Oberman and M.J. Flynn, “Implementing Division and Other Floating Point Operations: A System Perspective,” Scientific Computing and Validated Numerics, pp. 1824, 1996.
[17] S.F. Oberman, “Floating Point Division and Square Root Algorithms and Implementation in the AMDK7 Microprocessor,” Proc. 14th Symp. Computer Arithmetic (ARITH14), pp. 106115, Apr. 1999.
[18] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other Floating Point Operations,” IEEE Trans. Computers, vol. 46, no. 2, pp. 154161, Feb. 1997.
[19] J.A. Piñeiro and J.D. Bruguera, “HighSpeed DoublePrecision Computation of Reciprocal, Division, Square Root and Inverse Square Root,” technical report,ftp://ftp.cs.uiuc.edu/pub/dept/tech_reports/ 1995/http:/www.ac.usc.es, 2001.
[20] J.A. Piñeiro, J.D. Bruguera, and J.M. Muller, “Faithful Powering Computation Using Table LookUp and a Fused Accumulation Tree,” Proc. IEEE 15th Int'l Symp. Computer Arithmetic (ARITH15), pp. 4047, 2001.
[21] M.J. Schulte and J.E. Stine, “Symmetric Bipartite Tables for Accurate Function Approximation” Proc. 13th Symp. Computer Arithmetic (ARITH13), pp. 175183, 1997.
[22] P. Soderquist and M. Leeser, “Area and Performance Tradeoffs in Floating Point Divide and Square Root Implementations,” ACM Computer Surveys, pp. 518564, 1996.
[23] N. Takagi, “Powering by a Table LookUp and a Multiplication with Operand Modification,” IEEE Trans. Computers, vol. 47, no. 11, pp. 12161222, Nov. 1998.
[24] P.T.P. Tang, “Table LookUp Algorithms for Elementary Functions and their Error Analysis,” Argonne Nat'l Laboratory Report, MCSP1941190, Jan. 1991.
[25] Waterloo Maple Inc., Maple V Programming Guide, 1998.
[26] W.F. Wong and E. Goto, “Fast HardwareBased Algorithms for Elementary Function Computations,” IEEE Trans. Computers, vol. 43, no. 3, pp. 278294, Mar. 1994.