• Publication
  • 1996
  • Issue No. 1 - January
  • Abstract - A Fast, Efficient Parallel-Acting Method of Generating Functions Defined by Power Series, Including Logarithm, Exponential, and Sine, Cosine
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Fast, Efficient Parallel-Acting Method of Generating Functions Defined by Power Series, Including Logarithm, Exponential, and Sine, Cosine
January 1996 (vol. 7 no. 1)
pp. 33-45

Abstract—A fundamental parallel procedure of implementing certain algorithms is by means of trees and arrays, [1]. A method of generating any function defined by a power series in a fast, efficient parallel-acting manner using trees and arrays is described. The power series considered can be written as f(Y) = a0 + a1Y + a2Y2 + ... where Y = v1x + v2x2 + ... + vkxk, vi = (0, 1), is a binary fraction when x = ½. The power series must be expanded into individual terms cxi. These terms are then transformed into weighted binary terms. Two methods are given to obtain all the individual terms (including coefficients) associated with each power of x. The hardware required for implementation is a tree similar to a Wallace or Dadda tree used for parallel multiplication of two binary numbers. Despite the multiplicity of terms required, Boolean logic methods reduce the tree dimensions in many cases so that the total tree required is smaller than an existing multiplier tree. In that case, Schwarz and Flynn, [13], [15], have shown that the required tree can be superimposed on the existing multiplier tree in a multiplexed manner with relatively little increase in hardware. The generation of the logarithmic function is described in detail. Comparisons with other methods are made for the case of 11 bit accuracy of the logarithm. Using a figure of merit of latency times area (number of transistors), estimates show that the superposition scheme gives the best (smallest) figure of merit. For 11 bit accuracy, the superposition scheme requires only about 480 additional gates to be superimposed upon a 41 bit or larger multiplier, and the speed of operation is that of the multiplier.

[1] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[2] R. Stefanelli,"A suggestion for a high-speed parallel binary divider," IEEE Trans. Computers, vol. 21, no. 1, pp. 42-45, Jan. 1972.
[3] D.M. Mandelbaum,“A systematic method for division with high average bit skipping,” IEEE Trans. Computers, vol. 39, pp. 127-130, Jan. 1990.
[4] E.M. Schwarz and M.J. Flynn,“Cost efficient high radix division,” J. VLSI Signal Processing, pp. 293-305, Aug. 1991.
[5] D. Knuth, The Art of Computer Programming, Vol. 2, Addison-Wesley, Reading, Mass., 1998.
[6] J.N. Mitchell Jr.,"Computer multiplication and division using binary logarithm," IRE Trans. Electronic Computers, vol. 11, pp. 512-517, Aug. 1962.
[7] H.-Y. Lo and Y. Aoki,"Generation of a precise binary logarithm with difference grouping programmable logic array," IEEE Trans. Computers, vol. 34, no. 8, pp. 681-692, Aug. 1985.
[8] I. Niven and H.S. Zuckerman, An Introduction to the Theory of Numbers. New York: J. Wiley, 1958.
[9] K. Hwang,, Computer Arithmetic, Principles, Architecture, and Design.New York: John Wiley&Sons, 1979.
[10] G. Chrystal, Algebra, Part II. New York: Chelsea, 1952.
[11] D.M. Mandelbaum,“Some results on a SRT type division scheme,” IEEE Trans. Computers, vol. 42, pp. 102-106, Jan. 1993.
[12] E.M. Schwarz and M.J. Flynn, "Approximating the Sine Function with Combinational Logic," Proc. 26th Asilomar Conf. Signals, Systems, and Computers, vol. 1, pp. 386-390, Oct. 1992.
[13] E.M. Schwarz and M.J. Flynn,“Hardware starting approximation for the square root operation,” Proc. IEEE 11th Symp. Computer Arithmetic, pp. 103-11, 1993.
[14] A. Neumaier, Interval Methods for Systems of Equations. London: Cambridge Univ. Press, 1990.
[15] E.M. Schwarz,“High-radix algorithms for high-order arithmetic expressions,” doctorial dissertation, Stanford Univ., Jan. 1993.
[16] E.M. Schwarz and M.J. Flynn,"Using a floating-point multiplier to sum signed Boolean elements," Technical Report CSL-TR-92-540, Stanford Univ., Aug. 1992
[17] A. Nijenhuis and H. Wilf, Combinatorial Algorithms: For Computers and Calculators, 2nd ed., Academic Press, New York, 1978, p. 56.
[18] P.T.P. Tang, "Table-Driven Implementation of the Logarithm Function in IEEE Floating-Point Arithmetic," ACM Trans. Math. Software, vol. 16, no. 4, pp. 378-400, 1990.
[19] D. Wong and M. Flynn,“Fast division using accurate quotient approximations to reduce the number of iterations,” IEEE Trans. Computers, vol. 41, pp. 981-995, Aug. 1992.
[20] T.R.N. Rao, Error Coding for Arithmetic Processors.New York: Academic Press, 1974.
[21] C. Hastings Jr., Approximations for Digital Computers. Princeton, N.J.: Princeton Univ. Press, 1955.
[22] P.M. Farmwald,"High bandwidth evaluation of elementary functions," Proc. Fifth Symp. Computer Arithmetic, pp. 139-142, 1981.
[23] J. Child,"RISC and pentium drive demand for SRAMs that are the fastest of the fast," Computer Design, vol. 33, no. 4, pp. 47-54, 1994.

Index Terms:
Arrays, cosine, exponential, functions, logarithm, multinomials, multiplier tree, partitions, power series, sine.
Citation:
David M. Mandelbaum, Stefanie G. Mandelbaum, "A Fast, Efficient Parallel-Acting Method of Generating Functions Defined by Power Series, Including Logarithm, Exponential, and Sine, Cosine," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 1, pp. 33-45, Jan. 1996, doi:10.1109/71.481596
Usage of this product signifies your acceptance of the Terms of Use.