Subscribe

Issue No.02 - February (2011 vol.60)

pp: 214-227

Hervé Knochel , STMicroelectronics' Compilation Expertise Center

Christophe Monat , STMicroelectronics' Compilation Expertise Center

Claude-Pierre Jeannerod , INRIA Arénaire, Lyon

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2010.152

ABSTRACT

In this paper, we show how to reduce the computation of correctly rounded square roots of binary floating-point data to the fixed-point evaluation of some particular integer polynomials in two variables. By designing parallel and accurate evaluation schemes for such bivariate polynomials, we show further that this approach allows for high instruction-level parallelism (ILP) exposure, and thus, potentially low-latency implementations. Then, as an illustration, we detail a C implementation of our method in the case of IEEE 754-2008 binary32 floating-point data (formerly called single precision in the 1985 version of the IEEE 754 standard). This software implementation, which assumes 32-bit unsigned integer arithmetic only, is almost complete in the sense that it supports special operands, subnormal numbers, and all rounding-direction attributes, but not exception handling (that is, status flags are not set). Finally, we have carried out experiments with this implementation on the ST231, an integer processor from the STMicroelectronics' ST200 family, using the ST200 family VLIW compiler. The results obtained demonstrate the practical interest of our approach in that context: for all rounding-direction attributes, the generated assembly code is optimally scheduled and has indeed low latency (23 cycles).

INDEX TERMS

Binary floating-point arithmetic, square root, correct rounding, IEEE 754, polynomial evaluation, instruction-level parallelism, rounding error analysis, C software implementation, VLIW integer processor.

CITATION

Hervé Knochel, Christophe Monat, Claude-Pierre Jeannerod, "Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation",

*IEEE Transactions on Computers*, vol.60, no. 2, pp. 214-227, February 2011, doi:10.1109/TC.2010.152REFERENCES

- [1] "IEEE Standard for Binary Floating-Point Arithmetic,"
ANSI/IEEE Standard, Std 754-1985, Am. Nat'l Standards Inst. and Inst. of Electrical and Electronic Engineers, 1985.- [2] "IEEE Standard for Floating-Point Arithmetic,"
IEEE Std. 754-2008, pp. 1-58, Aug. 2008.- [3] P. Montuschi and P.M. Mezzalama, "Survey of Square Rooting Algorithms,"
IEE Proc.—Computers and Digital Techniques, vol. 137, no. 1, pp. 31-40, 1990.- [4] P. Markstein,
IA-64 and Elementary Functions: Speed and Precision. Prentice Hall, 2000.- [5] M. Cornea, J. Harrison, and P.T.P. Tang,
Scientific Computing on Itanium-Based Systems. Intel Press, 2002.- [6] M.D. Ercegovac and T. Lang,
Digital Arithmetic. Morgan Kaufmann, 2004.- [7] C.-P. Jeannerod, H. Knochel, C. Monat, and G. Revy, "Faster Floating-Point Square Root for Integer Processors,"
Proc. IEEE Symp. Industrial Embedded Systems (SIES '07), 2007.- [8] G. Revy, "Implementation of Binary Floating-Point Arithmetic on Embedded Integer Processors—Polynomial Evaluation-Based Algorithms and Certified Code Generation," PhD dissertation, Université de Lyon—École Normale Supérieure de Lyon, Dec. 2009.
- [9] C.-P. Jeannerod, H. Knochel, C. Monat, and G. Revy, "Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation," Technical Report RR2008-38, Laboratoire de l'Informatique du Parallélisme (LIP), http://prunel.ccsd.cnrs.frensl-00335792, Oct. 2008.
- [10] R.C. Agarwal, F.G. Gustavson, and M.S. Schmookler, "Series Approximation Methods for Divide and Square Root in the Power3 processor,"
Proc. IEEE Symp. Computer Arithmetic, I. Koren and P. Kornerup, eds., pp. 116-123, 1999.- [11] J.-A. Piñeiro and J.D. Bruguera, "High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root,"
IEEE Trans. Computers, vol. 51, no. 12, pp. 1377-1388, Dec. 2002.- [12] S.-K. Raina, "FLIP: A Floating-Point Library for Integer Processors," PhD dissertation, École Normale Supérieure de Lyon, http://www.ens-lyon.fr/LIP/Pub/Rapports/ PhD/PhD2006PhD2006-02.pdf, 2006.
- [13] J.-M. Muller,
Elementary Functions: Algorithms and Implementation, second ed. Birkhäuser, 2006.- [14] C.Q. Lauter, "Arrondi Correct de Fonctions Mathématiques— Fonctions Univariées et Bivariées, Certification et Automatisation," PhD dissertation, École Normale Supérieure de Lyon, 2008.
- [15] S. Chevillard, "Évaluation Efficace de Fonctions Numériques— Outils et Exemples," PhD dissertation, École Normale Supérieure de Lyon, 2009.
- [16] S. Chevillard and C. Lauter, "A Certified Infinite Norm for the Implementation of Elementary Functions,"
Proc. Seventh IEEE Int'l Conf. Quality Software (QSIC '07), A. Mathur, W.E. Wong, and M.F. Lau, eds., pp. 153-160, 2007.- [17] S. Chevillard, M. Joldes, and C. Lauter, "Certified and Fast Computation of Supremum Norms of Approximation Errors,"
Proc. 19th IEEE Symp. Computer Arithmetic (ARITH-19), June 2009.- [18] J. Harrison, T. Kubaska, S. Story, and P. Tang, "The Computation of Transcendental Functions on the IA-64 Architecture,"
Intel Technology J., vol. Q4, pp. 1-7, 1999.- [19]
Programming Languages—C, ISO/IEC Standard 9899:1999, Int'l Organization for Standardization, Dec. 1999.- [20] G. Melquiond, "De l'Arithmétique d'Intervalles à la Certification de Programmes," PhD dissertation, École Normale Supérieure de Lyon, http://www.msr-inria.inria.fr/gmelquio/doc 06-these. pdf, 2006.
- [21] M. Daumas and G. Melquiond, "Certification of Bounds on Expressions Involving Rounded Operators,"
Trans. Math. Software, vol. 37, no. 1, 2009.- [22] F. de Dinechin, C. Lauter, and G. Melquiond, "Assisted Verification of Elementary Functions Using Gappa,"
Proc. 2006 ACM Symp. Applied Computing, http://www.msr-inria.inria.fr/ gmelquio/ doc06-mcms-article.pdf, pp. 1318-1322, 2006.- [23] J.A. Fisher, P. Faraboschi, and C. Young,
Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann, 2005.- [24] C. Bruel, "If-Conversion SSA Framework for Partially Predicated VLIW Architectures,"
Proc. Digest of the Fourth Workshop Optimizations for DSP and Embedded Systems, Mar. 2006.- [25] C.-P. Jeannerod, H. Knochel, C. Monat, G. Revy, and G. Villard, "A New Binary Floating-Point Division Algorithm and Its Software Implementation on the ST231 Processor,"
Proc. 19th IEEE Symp. Computer Arithmetic (ARITH-19), June 2009.- [26] C.-P. Jeannerod and G. Revy, "Optimizing Correctly-Rounded Reciprocal Square Roots for Embedded VLIW Cores,"
Proc. 43rd Asilomar Conf. Signals, Systems and Computers, Nov. 2009. |