Issue No. 08 - August (2000 vol. 11)
ISSN: 1045-9219
pp: 838-850
ABSTRACT
<p><b>Abstract</b>—In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a <tmath>$w^k\hbox{-}{\rm{bit}}$</tmath>, <tmath>$(k\geq 2)$</tmath>, sequence using as basic building blocks linear arrays of at most <tmath>$w^2$</tmath> shift switches, where <tmath>$w$</tmath> is a small power of <tmath>$2$</tmath>. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most <tmath>$w^2$</tmath>. We adopt a VLSI delay model where the “length” of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a <tmath>$w^k\hbox{-}{\rm{bit}}$</tmath> binary sequence in the time of <tmath>$2k-2$</tmath> broadcasts, while the corresponding prefix sums can be computed in the time of <tmath>$3k-4$</tmath> broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most <tmath>$w^2$</tmath>, the total number of broadcasts involved is less than three times the number required by an “ideal” design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a <tmath>$kw^k\hbox{-}{\rm{bit}}$</tmath> binary sequence in the time of <tmath>$3k+\lceil\log_w k\rceil -3$</tmath> broadcasts. Using this design, the corresponding prefix sums can be computed in the time of <tmath>$4k+\lceil\log_w k\rceil -5$</tmath> broadcasts.</p>
INDEX TERMS
Hardware-algorithms, shift switching, binary prefix sums, binary counting, scalable architectures, pipelining.
CITATION
A.Y. Zomaya, S. Olariu, K. Nakano, M.C. Pinotti, J.L. Schwing, R. Lin, "Scalable Hardware-Algorithms for Binary Prefix Sums", IEEE Transactions on Parallel & Distributed Systems, vol. 11, no. , pp. 838-850, August 2000, doi:10.1109/71.877941