*IEEE Transactions on Parallel & Distributed Systems*, vol.7, no. 1, pp. 33-45, January 1996, doi:10.1109/71.481596

0" + a<sub>1< sub>y a<sub>2< sub>y<super>2< super> ... where y="v1x" v<sub>2< sub>x<super>2< v<sub>k< sub>x<super>k< super>, v<sub>i< sub>="(0," 1), binary fraction when x="½." must expanded into individual terms cx<super>i< super>. these are then transformed weighted terms. two methods given to obtain all (including coefficients) associated with each x. hardware required for implementation tree similar wallace or dadda used parallel multiplication numbers. despite multiplicity required, boolean logic reduce dimensions many cases so that total smaller than an existing multiplier tree. case, schwarz flynn, [<ref rid="bibl003313" >13< ref>], >15< have shown superimposed on multiplexed relatively little increase hardware. generation logarithmic described detail. comparisons other made case 11 bit accuracy logarithm. figure merit latency times area (number transistors), estimates show superposition scheme gives best (smallest) merit. accuracy, requires only about 480 additional gates upon 41 larger multiplier, speed operation multiplier.< p>"> 0" + a<sub>1< sub>y a<sub>2< sub>y<super>2< super> ... where y="v1x" v<sub>2< sub>x<super>2< v<sub>k< sub>x<super>k< super>, v<sub>i< sub>="(0," 1), binary fraction when x="½." must expanded into individual terms cx<super>i< super>. these are then transformed weighted terms. two methods given to obtain all (including coefficients) associated with each x. hardware required for implementation tree similar wallace or dadda used parallel multiplication numbers. despite multiplicity required, boolean logic reduce dimensions many cases so that total smaller than an existing multiplier tree. case, schwarz flynn, [<ref rid="bibl003313" >13< ref>], >15< have shown superimposed on multiplexed relatively little increase hardware. generation logarithmic described detail. comparisons other made case 11 bit accuracy logarithm. figure merit latency times area (number transistors), estimates show superposition scheme gives best (smallest) merit. accuracy, requires only about 480 additional gates upon 41 larger multiplier, speed operation multiplier.< p>"> 0" + a<sub>1< sub>y a<sub>2< sub>y<super>2< super> ... where y="v1x" v<sub>2< sub>x<super>2< v<sub>k< sub>x<super>k< super>, v<sub>i< sub>="(0," 1), binary fraction when x="½." must expanded into individual terms cx<super>i< super>. these are then transformed weighted terms. two methods given to obtain all (including coefficients) associated with each x. hardware required for implementation tree similar wallace or dadda used parallel multiplication numbers. despite multiplicity required, boolean logic reduce dimensions many cases so that total smaller than an existing multiplier tree. case, schwarz flynn, [<ref rid="bibl003313" >13< ref>], >15< have shown superimposed on multiplexed relatively little increase hardware. generation logarithmic described detail. comparisons other made case 11 bit accuracy logarithm. figure merit latency times area (number transistors), estimates show superposition scheme gives best (smallest) merit. accuracy, requires only about 480 additional gates upon 41 larger multiplier, speed operation multiplier.< p>">

Subscribe

Issue No.01 - January (1996 vol.7)

pp: 33-45

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.481596

ABSTRACT

<p><b>Abstract</b>—A fundamental parallel procedure of implementing certain algorithms is by means of trees and arrays, [<ref rid="bibl00331" type="bib">1</ref>]. A method of generating any function defined by a power series in a fast, efficient parallel-acting manner using trees and arrays is described. The power series considered can be written as f(Y) = a<sub>0</sub> + a<sub>1</sub>Y + a<sub>2</sub>Y<super>2</super> + ... where Y = v<sub>1</sub>x + v<sub>2</sub>x<super>2</super> + ... + v<sub>k</sub>x<super>k</super>, v<sub>i</sub> = (0, 1), is a binary fraction when x = ½. The power series must be expanded into individual terms cx<super>i</super>. These terms are then transformed into weighted binary terms. Two methods are given to obtain all the individual terms (including coefficients) associated with each power of x. The hardware required for implementation is a tree similar to a Wallace or Dadda tree used for parallel multiplication of two binary numbers. Despite the multiplicity of terms required, Boolean logic methods reduce the tree dimensions in many cases so that the total tree required is smaller than an existing multiplier tree. In that case, Schwarz and Flynn, [<ref rid="bibl003313" type="bib">13</ref>], [<ref rid="bibl003315" type="bib">15</ref>], have shown that the required tree can be superimposed on the existing multiplier tree in a multiplexed manner with relatively little increase in hardware. The generation of the logarithmic function is described in detail. Comparisons with other methods are made for the case of 11 bit accuracy of the logarithm. Using a figure of merit of latency times area (number of transistors), estimates show that the superposition scheme gives the best (smallest) figure of merit. For 11 bit accuracy, the superposition scheme requires only about 480 additional gates to be superimposed upon a 41 bit or larger multiplier, and the speed of operation is that of the multiplier.</p>

INDEX TERMS

Arrays, cosine, exponential, functions, logarithm, multinomials, multiplier tree, partitions, power series, sine.

CITATION

David M. Mandelbaum, Stefanie G. Mandelbaum, "A Fast, Efficient Parallel-Acting Method of Generating Functions Defined by Power Series, Including Logarithm, Exponential, and Sine, Cosine", *IEEE Transactions on Parallel & Distributed Systems*, vol.7, no. 1, pp. 33-45, January 1996, doi:10.1109/71.481596