Issue No. 02 - February (2011 vol. 60)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2010.152
Claude-Pierre Jeannerod , INRIA Arénaire, Lyon
Hervé Knochel , STMicroelectronics' Compilation Expertise Center
Christophe Monat , STMicroelectronics' Compilation Expertise Center
Guillaume Revy , University of California at Berkeley, Berkeley
In this paper, we show how to reduce the computation of correctly rounded square roots of binary floating-point data to the fixed-point evaluation of some particular integer polynomials in two variables. By designing parallel and accurate evaluation schemes for such bivariate polynomials, we show further that this approach allows for high instruction-level parallelism (ILP) exposure, and thus, potentially low-latency implementations. Then, as an illustration, we detail a C implementation of our method in the case of IEEE 754-2008 binary32 floating-point data (formerly called single precision in the 1985 version of the IEEE 754 standard). This software implementation, which assumes 32-bit unsigned integer arithmetic only, is almost complete in the sense that it supports special operands, subnormal numbers, and all rounding-direction attributes, but not exception handling (that is, status flags are not set). Finally, we have carried out experiments with this implementation on the ST231, an integer processor from the STMicroelectronics' ST200 family, using the ST200 family VLIW compiler. The results obtained demonstrate the practical interest of our approach in that context: for all rounding-direction attributes, the generated assembly code is optimally scheduled and has indeed low latency (23 cycles).
Binary floating-point arithmetic, square root, correct rounding, IEEE 754, polynomial evaluation, instruction-level parallelism, rounding error analysis, C software implementation, VLIW integer processor.
C. Monat, G. Revy, H. Knochel and C. Jeannerod, "Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation," in IEEE Transactions on Computers, vol. 60, no. , pp. 214-227, 2010.