Publication 2000 Issue No. 8 - August Abstract - Scalable Hardware-Algorithms for Binary Prefix Sums
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by R. Lin Articles by K. Nakano Articles by S. Olariu Articles by M.C. Pinotti Articles by J.L. Schwing Articles by A.Y. Zomaya
Scalable Hardware-Algorithms for Binary Prefix Sums
August 2000 (vol. 11 no. 8)
pp. 838-850
 ASCII Text x R. Lin, K. Nakano, S. Olariu, M.C. Pinotti, J.L. Schwing, A.Y. Zomaya, "Scalable Hardware-Algorithms for Binary Prefix Sums," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 8, pp. 838-850, August, 2000.
 BibTex x @article{ 10.1109/71.877941,author = {R. Lin and K. Nakano and S. Olariu and M.C. Pinotti and J.L. Schwing and A.Y. Zomaya},title = {Scalable Hardware-Algorithms for Binary Prefix Sums},journal ={IEEE Transactions on Parallel and Distributed Systems},volume = {11},number = {8},issn = {1045-9219},year = {2000},pages = {838-850},doi = {http://doi.ieeecomputersociety.org/10.1109/71.877941},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Parallel and Distributed SystemsTI - Scalable Hardware-Algorithms for Binary Prefix SumsIS - 8SN - 1045-9219SP838EP850EPD - 838-850A1 - R. Lin, A1 - K. Nakano, A1 - S. Olariu, A1 - M.C. Pinotti, A1 - J.L. Schwing, A1 - A.Y. Zomaya, PY - 2000KW - Hardware-algorithmsKW - shift switchingKW - binary prefix sumsKW - binary countingKW - scalable architecturesKW - pipelining.VL - 11JA - IEEE Transactions on Parallel and Distributed SystemsER -

Abstract—In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a $w^k\hbox{-}{\rm{bit}}$, $(k\geq 2)$, sequence using as basic building blocks linear arrays of at most $w^2$ shift switches, where $w$ is a small power of $2$. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most $w^2$. We adopt a VLSI delay model where the “length” of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a $w^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $2k-2$ broadcasts, while the corresponding prefix sums can be computed in the time of $3k-4$ broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most $w^2$, the total number of broadcasts involved is less than three times the number required by an “ideal” design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a $kw^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $3k+\lceil\log_w k\rceil -3$ broadcasts. Using this design, the corresponding prefix sums can be computed in the time of $4k+\lceil\log_w k\rceil -5$ broadcasts.

[1] S.G. Akl, Parallel Computation: Models and Methods. Upper Saddle River, N.J.: Prentice Hall, 1997.
[2] H. Alnuweiri, M. Alimuddin, and H. Aljunaidi, “Switch Models and Reconfigurable Networks: Tutorial and Partial Survey,” Proc. Workshop on Reconfigurable Architectures, pp. 1–10, Apr. 1994.
[3] G. E. Blelloch,“Scans as primitive operations,”IEEE Trans. Comput., vol. C-38, no. 11, pp. 1526–1538, Nov. 1989.
[4] R.P. Brent and H.T. Kung, “A Regular Layout for Parallel Adders,” IEEE Trans. Computers, vol. 31, pp. 260–264, 1982.
[5] J.J.F. Cavanaugh, Digital Computer Arithmetic Design and Implementation. New York: McGraw-Hill, 1984.
[6] L. Dadda and V. Piuri, “Pipelined Adders,” IEEE Trans. Computers, vol. 45, pp. 348–356, 1996.
[7] F. Halsall, Data Communications, Computer Networks and Open Systems. Addison-Wesley, 1996.
[8] R.H. Katz, Contemporary Logic Design. Benjamin/Cummings Publishing, 1994.
[9] U. Ko, P. T. Balsara, and W. Lee, “Low-Power Design Techniques for High-Performance CMOS Adders,” IEEE Trans. VLSI Systems, vol. 3, pp. 327–333, 1995.
[10] P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrences,” IEEE Trans. Computers, vol. 22, pp. 786–793, 1973.
[11] H.T. Kung and C.E. Leiserson, “Algorithms for VLSI Processor Arrays,” Introduction to VLSI Systems, C. Mead and L. Conway, eds. Reading, Mass.: Addison-Wesley, 1980.
[12] R.E. Ladner and M.J. Fischer, "Parallel Prefix Computation," J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 1980.
[13] S. Lakshmivarahan and S.K. Dhall, Parallel Computing Using the Prefix Problem. Oxford University Press, 1994.
[14] H. Li and M. Maresca,“Polymorphic-torus network,” IEEE Trans. on Computers, vol. 38, no. 9, pp. 1345-1351, Sept. 1989.
[15] M.-B. Lin and A.Y. Oru, “The Design of an Optoelectric Arithmetic Processor Based on Permutation Networks,” IEEE Trans. Computers, vol. 46, pp. 142–153, 1997.
[16] R. Lin and S. Olariu, "Reconfigurable Buses with Shift Switching: Concepts and Applications," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 1, pp. 93-102, Jan. 1995.
[17] N. Lindert, T. Sugii, S. Tang, C. Hu, “Dynamic Threshold Pass-Transistor Logic for Improved Delay at Lower Power Supply Voltages,” IEEE J. Solid-State Circuits, vol. 34, pp. 85–89, 1999.
[18] M. Maresca, "Polymorphic Processor Arrays," IEEE Trans. Parallel and Distributed Systems, vol. 4, pp. 490-506, 1993.
[19] R. Miller,V.K. Prasanna Kumar,D.I. Reisis, and Q.F. Stout,“Parallel computations on reconfigurable meshes,” IEEE Trans. on Computers, pp. 678-692, June 1993.
[20] K. Nakano, “An Efficient Algorithm for Summing up Binary Values on a Reconfigurable Mesh,” IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, vol. E77-A, no. 4, pp. 652–657, 1994.
[21] K. Nakano, “Prefix-Sums Algorithms on Reconfigurable Meshes,” Parallel Processing Letters, vol. 5, pp. 23-35, 1995.
[22] K. Nakano and S. Olariu, “An Efficient Algorithm for Row Minima Computations on Basic Recofigurable Meshes,” IEEE Trans. Parallel and Distributed Systems, vol. 9, pp. 561-569, 1998.
[23] K. Nakano, “A Bibliography of Published Papers on Dynamically Reconfigurable Architectures,” Parallel Processing Letters, vol. 5, pp. 111-124, 1995.
[24] S. Olariu, J.L. Schwing, and J. Zhang, “Fundamental Data Movement Algorithms for Reconfigurable Meshes,” Int'l J. High Speed Computing, vol. 6, pp. 311–323, 1994.
[25] T.-H. Liu, M.K. Ganai, A. Aziz, and J.L. Burns, “Performance Driven Synthesis for Pass-Transistor Logic,” Proc. 12th IEEE Int'l Conf. VLSI Design, pp. 372–377, 1999.
[26] W.-H. Paik, S.-W. Kim, “Sum-Selector Generation Algorithm Based 64-Bit Adder Using Dynamic Chain Architecture,” Proc. Fourth IEEE Int'l Conf. Electronics, Circuits and Systems, vol. 3, pp. 1,020-1,024, 1997.
[27] B. Parhami, Computer Arithmetic—Algorithms and Hardware Designs. New York: Oxford Univ. Press, 2000.
[28] T. Stourakis, S.W. Kim, and A. Skavantzos, “Full-Adder Based Arithmetic Units in Finite Fields,” IEEE Trans. Circuits and Systems II, vol. 40, pp. 741–745, 1993.
[29] M. Suzuki et al. "A 1.5ns 32b CMOS ALU in Double Pass-Transistor Logic," IEEE J. Solid-State Circuits, vol. 28, no. 11, pp. 1,145-1,151, Nov. 1993.
[30] E. E. Swartzlander,Computer Arithmetic,, vol. 1. Los Alamitos, CA: IEEE Computer Society, 1990.
[31] E.E. Swartzlander, Jr.,“Parallel Counters,” IEEE Trans. Computers, vol. 22, pp. 1,021–1,024, 1973.
[32] Thinking Machines Corporation, “Connection Machine Parallel Instruction Set (PARIS),” July 1986.
[33] J.D. Ullman, Computational Aspects of VLSI. Rockville, Md.: Computer Science Press, 1984.
[34] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-Wesley, 1994.
[35] R. Zimmermann and W. Fichtner, “Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic,” IEEE J. Solid-State Circuits, vol. 32, pp. 1,079–1,090, 1997.

Index Terms:
Hardware-algorithms, shift switching, binary prefix sums, binary counting, scalable architectures, pipelining.
Citation:
R. Lin, K. Nakano, S. Olariu, M.C. Pinotti, J.L. Schwing, A.Y. Zomaya, "Scalable Hardware-Algorithms for Binary Prefix Sums," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 8, pp. 838-850, Aug. 2000, doi:10.1109/71.877941