The Community for Technology Leaders
20th IEEE Symposium on Computer Arithmetic (ARITH 2011) (2011)
Tubingen
July 25, 2011 to July 27, 2011
ISSN: 1063-6889
ISBN: 978-1-4244-9457-6
pp: 149-158
ABSTRACT
We describe a high-performance digit-recurrence algorithm for computing exactly rounded reciprocals, square roots, and reciprocal square roots in hardware at a rate of three result bits - one radix-8 digit - per recurrence iteration. To achieve a single-cycle recurrence at a short cycle time, we adapted the digit-by-rounding algorithm, which is normally applied at much higher radices, for efficient operation at radix 8. Using this approach avoids in the recurrence step the lookup table required by SRT - the usual algorithm used for hardware digit recurrences. The increasing access latency of this table, the size of which grows super linearly in the radix, limits high-frequency SRT implementations to radix 4 or lower. We also developed a series of novel optimizations focused on further reducing the critical path through the recurrence. We propose, for example, decreasing data path widths to a point where erroneous results sometimes occur and then correcting these errors off the critical path. We present a specific implementation that computes any of these functions to 31 bits of precision in 13 cycles. Our implementation achieves a cycle time only 11% longer than the best reported SRT design for the same functions, yet delivers results in five fewer cycles. Finally, we show that even at lower radices, a digit-by-rounding design is likely to have a shorter critical path than one using SRT at the same radix.
INDEX TERMS
digital arithmetic, iterative methods, optimisation, table lookup
CITATION

J. A. Butts, P. T. Tang, R. O. Dror and D. E. Shaw, "Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots," 20th IEEE Symposium on Computer Arithmetic (ARITH 2011)(ARITH), Tubingen, 2011, pp. 149-158.
doi:10.1109/ARITH.2011.28
87 ms
(Ver 3.3 (11022016))