Subscribe

Issue No.08 - August (2000 vol.11)

pp: 838-850

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.877941

ABSTRACT

<p><b>Abstract</b>—In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a <tmath>$w^k\hbox{-}{\rm{bit}}$</tmath>, <tmath>$(k\geq 2)$</tmath>, sequence using as basic building blocks linear arrays of at most <tmath>$w^2$</tmath> shift switches, where <tmath>$w$</tmath> is a small power of <tmath>$2$</tmath>. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most <tmath>$w^2$</tmath>. We adopt a VLSI delay model where the “length” of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a <tmath>$w^k\hbox{-}{\rm{bit}}$</tmath> binary sequence in the time of <tmath>$2k-2$</tmath> broadcasts, while the corresponding prefix sums can be computed in the time of <tmath>$3k-4$</tmath> broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most <tmath>$w^2$</tmath>, the total number of broadcasts involved is less than three times the number required by an “ideal” design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a <tmath>$kw^k\hbox{-}{\rm{bit}}$</tmath> binary sequence in the time of <tmath>$3k+\lceil\log_w k\rceil -3$</tmath> broadcasts. Using this design, the corresponding prefix sums can be computed in the time of <tmath>$4k+\lceil\log_w k\rceil -5$</tmath> broadcasts.</p>

INDEX TERMS

Hardware-algorithms, shift switching, binary prefix sums, binary counting, scalable architectures, pipelining.

CITATION

R. Lin, K. Nakano, S. Olariu, M.C. Pinotti, J.L. Schwing, A.Y. Zomaya, "Scalable Hardware-Algorithms for Binary Prefix Sums",

*IEEE Transactions on Parallel & Distributed Systems*, vol.11, no. 8, pp. 838-850, August 2000, doi:10.1109/71.877941