2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (2012)
Delft, Netherlands Netherlands
July 9, 2012 to July 11, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ASAP.2012.12
Graphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision (including subnormals). We describe software implementations for sinf, cosf, and sincosf over [-pi/4, pi/4]that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes.
Approximation methods, Accuracy, Polynomials, Program processors, Registers, Computer architecture, VLIW, trigonometric function, VLIW integer processor, instruction level parallelism (ILP), C software implementation, floating-point arithmetic, IEEE 754, unit in the last place
Jingyan Jourdan-Lu, Claude-Pierre Jeannerod, "Simultaneous Floating-Point Sine and Cosine for VLIW Integer Processors", 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, vol. 00, no. , pp. 69-76, 2012, doi:10.1109/ASAP.2012.12