This Article 
 Bibliographic References 
 Add to: 
Hardware Designs for Exactly Rounded Elementary Functions
August 1994 (vol. 43 no. 8)
pp. 964-973

This paper presents hardware designs that produce exactly rounded results for the functions of reciprocal, square-root, 2/sup x/, and log/sub 2/(x). These designs use polynomial approximation in which the terms in the approximation are generated in parallel, and then summed by using a multi-operand adder. To reduce the number of terms in the approximation, the input interval is partitioned into subintervals of equal size, and different coefficients are used for each subinterval. The coefficients used in the approximation are initially determined based on the Chebyshev series approximation. They are then adjusted to obtain exactly rounded results for all inputs. Hardware designs are presented, and delay and area comparisons are made based on the degree of the approximating polynomial and the accuracy of the final result. For single-precision floating point numbers, a design that produces exactly rounded results for all four functions has an estimated delay of 80 ns and a total chip area of 98 mm/sup 2/ in a 1.0-micron CMOS technology. Allowing the results to have a maximum error of one unit in the last place reduces the computational delay by 5% to 30% and the area requirements by 33% to 77%.

[1] P. Markstein, "Computation of elementary functions on the IBM RISC System/6000 processor,"IBM J. Res Develop., vol. 34, no. 1, pp. 111-119, Jan. 1990.
[2] W. Cody and W. WaRe,Software Manual for the Elementary Funclions. Englewood Cliffs, NJ: Prentice-Hall, 1980.
[3] C. T. Fike,Computer Evaluation of Mathematical Functions. Englewood Cliffs, NJ: Prentice-Hall, 1968.
[4] J. E. Volder, "The CORDIC trigonometric computing technique,"IRE Trans. Electron. Comput., vol. EC-8, pp. 330-334, 1959.
[5] M. J. Flynn, "'On division by functional iteration,"IEEE Trans. Comput., vol. C-19, pp. 702-706, 1970.
[6] I. Koren and O. Zinaty, "Evaluation of elementary functions in a numerical co-processor based on rational approximations,"IEEE Trans. Comput., vol. 39, pp. 1030-1037, 1990.
[7] M. D. Ercegovac, "Radix-16 evaluation of certain elementary functions,"IEEE Trans. Comput., vol. C-22, pp. 561-566, 1973.
[8] A.S. Noetzel, "an interpolating memory unit for function evaluation: analysis and design,"IEEE Trans. Comput., vol. 38, pp. 377-384, 1989.
[9] G. H. Garcia and W. J. Kubitz, "Minimum mean running time function generation using read only memory,"IEEE Trans. Comput., vol. C-32, pp. 147-156, 1983.
[10] P.T.P. Tang, "Table-lookup algorithms for elementary functions and their error analysis,"Proc. 10th Symp. Comput. Arithmetic, 1991, pp. 232-236.
[11] American National Standards Institute, "IEEE Standard 754 for binary floating point arithmetic,"ANSI/IEEE Standard No. 754, Washington DC, 1985.
[12] D. Goldberg, "What every computer scientist should know about floating-point arithmetic,"ACM Computing Surv., vol. 23, pp. 5-48, 1991.
[13] D. Hough, "Elementary functions based on IEEE arithmetic,"Mini/Micro West Conf. Rec., 1983, pp. 1-4.
[14] S. Gal and B. Bachelis, "An accurate elementary mathematical library for the ieee floating point standard,"ACM Trans. Math. Software, vol. 17, no. 1, pp. 26-45, Mar. 1991.
[15] C. M. Black, R. P. Burton, and T. H. Miller, "The need for an industry standard of accuracy for elementary function programs,"ACM Trans. Mathematical Software, vol. 1, pp. 361-366, 1984.
[16] A. Ziv, "Fast evaluation of elementary mathematical functions with correctly rounded last bit,"ACM Trans. Mathematical Software, vol. 17, pp. 410-423, 1991.
[17] M. J. Schulte and E. E. Swartzlander, "Exact rounding of certain elementary functions,"Proc. 11th Symp. Comput. Arithmetic, 1993, pp. 138-145.
[18] J.H. Mathews,Numerical Methods for Computer Science, Engineering and Mathematics. Englewood Cliffs, NJ: Prentice-Hall, 1987.
[19] P.M. Farmwald, "High bandwidth evaluation of elementary functions,"Proc. 5th Symp. Comput. Arithmetic, 1981, pp. 139-142.
[20] V.K. Jain, "Arithmetic analysis of a new reciprocal cell,"1992 Int. Conf. Comput. Design: VLSI in Comput. and Processors, 1992, pp. 106-109.
[21] V. K. Jain, S.A. Wadekar, and L. Lin, "Universal nonlinear component and its application to WSI,"IEEE Trans. Components, Hybrids, and Manufacturing Technol., vol. 16, pp. 656-664, 1993.
[22] V. K. Jain, G. E. Perez, and J. M. Wills, "DSP coprocessor cell for systolic arrays,"VLSI Signal Processing, vol. VI, pp. 480-488, 1993.
[23] C.R. Baugh and B. A. Wonley, "A two's complement parallel array multiplication algorithm,"IEEE Trans. Comput., vol. C-22, pp. 1045-1047, 1973.
[24] C. S. Wallace, "A suggestion for a fast multiplier,"IEEE Trans. Electron. Comput., vol. EC-13, pp. 14-17, 1964.
[25] R. P. Brent and H. Y. Kung, "A regular layout for parallel adders,"IEEE Trans. Comput., vol. C-31, pp. 260-264, 1982.
[26] A. Weinberger and J. L. Smith, "A logic for high-speed addition,"Nat. Bureau of Standards Circular, no. 591, pp. 3-12, 1958.
[27] E.E. Swartzlander, Jr., "Merged arithmetic,"IEEE Trans. Comput., vol. C-29, pp. 946-950, 1980.
[28] L. Dadda, "Some schemes for parallel multipliers,"Alta Frequenza, vol. 34, pp. 349-356, 1965.
[29] LSI Logic Corp.,LSI Logic 1.0 Micron Cell-Based Products Databook. Milpitas, CA: 1991.

Index Terms:
CMOS integrated circuits; digital arithmetic; Chebyshev approximation; approximation theory; polynomials; summing circuits; multiplying circuits; hardware designs; exactly rounded elementary functions; reciprocal; square-root; polynomial approximation; multi-operand adder; Chebyshev series approximation; single-precision floating point numbers; chip area; 1.0-micron CMOS technology; computational delay; computer arithmetic; exact rounding; parallel multiplier; argument reduction; special-purpose hardware; 1 micron.
M.J. Schulte, E.E. Swartzlander, Jr., "Hardware Designs for Exactly Rounded Elementary Functions," IEEE Transactions on Computers, vol. 43, no. 8, pp. 964-973, Aug. 1994, doi:10.1109/12.295858
Usage of this product signifies your acceptance of the Terms of Use.