Subscribe
Issue No.11 - November (2011 vol.60)
pp: 1535-1546
Daniel Piso , University of Santiago de Compostela, Santiago de Compostel
Javier D. Bruguera , University of Santiago de Compostela, Santiago de Compostela
ABSTRACT
A new variable latency Goldschmidt algorithm is presented. The algorithm is based on a new rounding method for division, square root, and their reciprocals that avoids the conventional remainder calculation in most of cases and improves previous proposals. The rounding decision is taken by checking the least significant bits of the output of the last Goldschmidt iteration without any other transformation. This helps to reduce the number of cases which need the calculation of the remainder. Additionally, we avoid the calculation of the remainder for most of those cases by using a remainder estimate that can be easily obtained from the Goldschmidt iteration. The calculation of the estimate is much simpler and less time consuming than the calculation of the remainder and this contributes to reducing the number of cases which need a large latency. The combination of both techniques allows us to define a variable latency algorithm which needs to compute the remainder in just nine percent of the total number of cases for reciprocal and division and in 12 percent for square root and square root reciprocal.
INDEX TERMS
Goldschmidt algorithm, division, square root, reciprocal, square root reciprocal, rounding, variable latency.
CITATION
Daniel Piso, Javier D. Bruguera, "Variable Latency Goldschmidt Algorithm Based on a New Rounding Method and a Remainder Estimate", IEEE Transactions on Computers, vol.60, no. 11, pp. 1535-1546, November 2011, doi:10.1109/TC.2010.269
REFERENCES
 [1] D. Chatterje and M. Sachdev, “Design of a 1.7-GHz Low-Power Delay-Fault-Testable 32-b ALU in 180-nm CMOS Technology,” IEEE Trans. Very Large Scale of Integration, vol. 13, no. 11, pp. 1296-1304, Nov. 2005. [2] J. Detrey, F. de Dinechin, and X. Pujol, “Return of the Hardware Floating-Point Elementary Functions,” Proc. 18th IEEE Symp. Computer Arithmetic (ARITH '07), 2007. [3] J. Detrey and F. de Dinechin, “Parametrized Floating-Point Logarithm and Exponential for FPGAs,” Microprocessors and Microsystems, special issue on FPGA-based reconfigurable computing, vol. 31, no. 8, pp. 537-545, 2007. [4] M.D. Ercegovac and T. Lang, Division and Square Root: Digit Recurrence Algorithms and Implementations. Kluwer Academic Publishers, 1994. [5] M.D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann Publishers, 2004. [6] K. Geddes, G. Labahn, and M.B. Monagan, Maple 12: Advanced Programming Guide. Maplesoft, Inc., 2008. [7] G. Gerwig, H. Wetter, E.M. Schwarz, and J. Haess, “High Performance Floating Point Unit with 116 bit Wide Divider,” Proc. 16th IEEE Symp. Computer Arithmetic (ARITH '03), 2003. [8] R. Goldschmidt, “Applications of Division by Convergence,” master's thesis, MIT, June 1964. [9] N. Ide, M. Hirano, Y. Endo, S. Yoshioka, H. Murakami, and A. Kunimatsu, “2.44-GFLOPS 300 MHz Floating Point Vector Processing Unit for High-Performance 3D Graphics Computing,” IEEE J. Solid-State Circuits, vol. 35, no. 7, pp. 1025-1033, July 2000. [10] IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std 754-2008. IEEE Computer Soc., 2008. [11] C. Iordache and D.W. Matula, “On Infinitely Precise Rounding for Division, Square Root, Reciprocal and Square Root Reciprocal,” Proc. 14th IEEE Symp. Computer Arithmetic (ARITH '99), 1999. [12] C.-P. Jeannerod and G. Revy, “Optimized Correctly-Rounded Reciprocal Square Root for Embedded VLIW Cores,” Proc. Asilomar Conf. Signals and Systems, 2009. [13] P. Markstein, “Computation of Elementary Functions on the IBM RISC Sytem/6000 Processor,” IBM J. Research and Development, vol. 34, no. 1, pp. 111-119, Jan. 1990. [14] S.M. Mueller, C. Jacobi, H.-J. Oh, K.D. Tran, S.R. Cottier, B.W. Michael, H. Nishikawa, Y. Totsuka, T. Namatame, N. Yano, T. Machida, and S.H. Dhong, “The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor,” Proc. 17th IEEE Symp. Computer Arithmetic (ARITH '05), 2005. [15] A. Naini, A. Dhablania, W. James, and D. Das Sarma, “1-GHz HAL SPARC64 Dual Floating Point Unit with RAS Features,” Proc. 15th IEEE Symp. Computer Arithmetic (ARITH '01), 2001. [16] J. Napper and P. Bientines, “Can Cloud Computing Reach the Top500?,” Proc. Combined Workshops UnConventional High Performance Computing Workshop plus Memory Access Workshop, 2009. [17] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other Floating-Point Operations,” Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997. [18] S. Oberman, “Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor,” Proc. 14th IEEE Symp. Computer Arithmetic (ARITH '99), 1999. [19] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. Oxford Univ. Press, 2010. [20] D. Piso and J.D. Bruguera, “Optimizing the Multiplier Design for Goldschmidt's Division and Reciprocal Units,” Proc. 21st Conf. Design of Circuits and Integrated Systems (DCIS '06), 2006. [21] D. Piso and J.D. Bruguera, “A New Rounding Algorithm for Variable Latency Division and Square Root Implementations,” Proc. Euromicro Symp. Digital Systems Design Conf. (DSD '08), 2008. [22] D. Piso and J.D. Bruguera, “A Rounding Method for Functional Iteration Algorithms with Parallel Remainder Calculation,” Proc. Euromicro Symp. Digital Systems Design Conference (DSD '09), 2009. [23] E. Schwarz, “Rounding Quadratically Converging Algorithms for Division and Square Root,” Proc. Asilomar Conf. Signals and Systems, 1995. [24] D. Strenski, J. Simkins, R. Walke, and R. Wittig, “Reevaluating FPGAs for 64-bit Floating Point Calculations,” HPC Wire, May 2008. [25] S.D. Trong, M. Schmookler, and E. Schwarz, “P6 Binary Floating-Point Unit,” Proc. 18th IEEE Symp. Computer Arithmetic (ARITH '07), 2007. [26] S. Waser and M.J. Flynn, Introduction to Arithmetic to Digital Systems Designers. CBS College Publishing, 1982.