This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Division Algorithms and Implementations
August 1997 (vol. 46 no. 8)
pp. 833-854

Abstract—Many algorithms have been developed for implementing division in hardware. These algorithms differ in many aspects, including quotient convergence rate, fundamental hardware primitives, and mathematical formulations. This paper presents a taxonomy of division algorithms which classifies the algorithms based upon their hardware implementations and impact on system design. Division algorithms can be divided into five classes: digit recurrence, functional iteration, very high radix, table look-up, and variable latency. Many practical division algorithms are hybrids of several of these classes. These algorithms are explained and compared in this work. It is found that, for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable. An implementation of division by functional iteration can provide the lowest latency for typical multiplier latencies. Variable latency algorithms show promise for simultaneously minimizing average latency while also minimizing area.

[1] SPEC benchmark suite release 2/92.
[2] Microprocessor Report, various issues, 1994-1996.
[3] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other Floating Point Operations,” IEEE Trans. Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997.
[4] C.V. Freiman, "Statistical Analysis of Certain Binary Division Algorithms," IRE Proc., vol. 49, pp. 91-103, 1961.
[5] J.E. Robertson, "A New Class of Digital Division Methods," IRE Trans. Electronic Computers, vol. 7, pp. 218-222, Sept. 1958.
[6] K.D. Tocher, "Techniques of Multiplication and Division for Automatic Binary Computers," Quarterly J. Mech. Appl. Math., vol. 11, pt. 3, pp. 364-384, 1958.
[7] D.E. Atkins, "Higher-Radix Division Using Estimates of the Divisor and Partial Remainders," IEEE Trans. Computers, vol. 17, no. 10, Oct. 1968.
[8] K.G. Tan, "The Theory and Implementation of High-Radix Division," Proc. Fourth IEEE Symp. Computer Arithmetic, pp. 154-163, June 1978.
[9] M.D. Ercegovac and T. Lang, Division and Square Root—Digit-Recurrence Algorithms and Implementations. Kluwer Academic, 1994.
[10] M. Flynn, "On Division by Functional Iteration," IEEE Trans. Computers, vol. 19, no. 8, Aug. 1970.
[11] P. Soderquist and M. Leeser, "An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations," Proc. 12th IEEE Symp. Computer Arithmetic, IEEE, 1995, pp. 132-139.
[12] ANSI/IEEE Std. 754-1985, Binary Floating-Point Arithmetic, IEEE Press, Piscataway, N.J., 1985 (also called ISO/IEC 559).
[13] S. Oberman, "Design Issues in High Performance Floating Point Arithmetic Units," PhD thesis, Stanford Univ., Nov. 1996.
[14] M.D. Ercegovac and T. Lang,“Simple radix-4 division with operands scaling,” IEEE Trans. Computers, vol. 39, no. 9, pp. 1,204-1,207, Sept. 1990.
[15] J. Fandrianto,“Algorithm for high-speed shared radix-8 division and radix-8 square root,” Proc. Ninth IEEE Computer Arithmetic Symp., pp. 68-75, 1989.
[16] S.E. McQuillan, J.V. McCanny, and R. Hamill, "New Algorithms and VLSI Architectures for SRT Division and Square Root," Proc. 11th Symp. Computer Arithmetic, pp. 80-86,Windsor, Ontario, Canada, 29 June- July2 1993.
[17] P. Montuschi and L. Ciminiera, "Reducing Iteration Time When Result Digit Is Zero for Radix 2 SRT Division and Square Root with Redundant Remainders," IEEE Trans. Computers, vol. 42, no. 2, pp. 239-246, Feb. 1993.
[18] P. Montuschi and L. Ciminiera, "Over-Redundant Digit Sets and the Design of Digit-by-Digit Division Units," IEEE Trans. Computers, vol. 43, no. 3, pp. 269-279, Mar. 1994.
[19] P. Montuschi and L. Ciminiera, "Radix-8 Division with Over-Redundant Digit Set," J. VLSI Signal Processing, vol. 7, no. 3, pp. 259-270, May 1994.
[20] D. Harris, S. Oberman, and M. Horowitz, “SRT Division Architectures and Implementations,” Proc. IEEE 13th Int'l Symp. Computer Arithmetic (ARITH13), pp. 18-25, 1997.
[21] N. Quach and M. Flynn, "A Radix-64 Floating-Point Divider," Technical Report CSL-TR-92-529, Computer Systems Laboratory, Stanford Univ., June 1992.
[22] H.R. Srinivas and K.K. Parhi, "A Fast Radix-4 Division Algorithm and Its Architecture," IEEE Trans. Computers, vol. 44, no. 6, pp. 826-831, June 1995.
[23] G.S. Taylor, "Radix 16 SRT Dividers with Overlapped Quotient Selection Stages," Proc. Seventh IEEE Symp. Computer Arithmetic, pp. 64-71, June 1985.
[24] T.E. Williams and M.A. Horowitz, "A Zero-Overhead Self-Timed 160-ns 54-b CMOS Divider," IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1,651-1,661, Nov. 1991.
[25] T. Asprey, G. Averill, E. DeLano, R. Mason, B. Weiner, and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, vol. 13, no. 3, pp. 22-35, June 1993.
[26] D. Hunt, “Advanced Performance Features of the 64-bit PA-8000,” Proc. COMPCON, pp. 123-128, 1995.
[27] T. Lynch, S. McIntyre, K. Tseng, S. Shaw, and T. Hurson, "High Speed Divider with Square Root Capability," U.S. Patent No. 5,128,891, 1992.
[28] J.A. Prabhu and G.B. Zyner, "167 MHz Radix-8 Divide and Squareroot Using Overlapped Radix-2 Stages," Proc. 12th Symp. Computer Arithmetic, IEEE CS Press, 1995, pp. 155-162.
[29] A. Svoboda, "An Algorithm for Division," Information Processing Machines, vol. 9, pp. 29-34, 1963.
[30] M.D. Ercegovac and T. Lang,"On-the-Fly Conversion of Redundant into Conventional Representations," IEEE Trans. Computers, vol. 36, pp. 895-897, 1987.
[31] M.D. Ercegovac and T. Lang, "On-the-Fly Rounding," IEEE Trans. Computers, vol. 41, no. 12, pp. 1,497-1,503, Dec. 1992.
[32] S.F. Anderson, J.G. Earle, R.E. Goldschmidt, and D.M. Powers, "The IBM System/360 Model 91: Floating-Point Execution Unit," IBM J. Research and Development, vol. 11, pp. 34-53, Jan. 1967.
[33] D.L. Fowler and J.E. Smith, "An Accurate, High Speed Implementation of Division by Reciprocal Approximation," Proc. Ninth IEEE Symp. Computer Arithmetic, pp. 60-67, Sept. 1989.
[34] R.E. Goldschmidt, "Applications of Division by Convergence," MS thesis, Dept. of Electrical Eng., Massachusetts Inst. of Tech nology, Cambridge, Mass., June 1964.
[35] Intel, i860 64-bit Microprocessor Programmer's Reference Manual, 1989.
[36] P.W. Markstein, “Computation of Elementary Functions on the IBM RISC System/6000 Processor,” IBM J. Research and Development, vol. 34, no. 1, pp. 111-119, Jan. 1990.
[37] H. Darley, M. Gill, D. Earl, D. Ngo, P. Wang, M. Hipona, and J. Dodrill, "Floating Point/Integer Processor with Divide and Square Root Functions," U.S. Patent No. 4,878,190, 1989.
[38] E. Schwarz, "Rounding for Quadratically Converging Algorithms for Division and Square Root," Proc. 29th Asilomar Conf. Signals, Systems, and Computers, pp. 600-603, Oct. 1995.
[39] D. DasSarma and D. Matula, "Faithful Interpolation in Reciprocal Tables," Proc. 13th IEEE Symp. Computer Arithmetic, July 1997.
[40] H. Kabuo, T. Taniguchi, A. Miyoshi, H. Yamashita, M. Urano, H. Edamatsu, and S. Kuninobu, "Accurate Rounding Scheme for the Newton-Raphson Method Using Redundant Binary Representation," IEEE Trans. Computers, vol. 43, no. 1, pp. 43-51, Jan. 1994.
[41] D. Wong and M. Flynn,“Fast division using accurate quotient approximations to reduce the number of iterations,” IEEE Trans. Computers, vol. 41, pp. 981-995, Aug. 1992.
[42] W.S. Briggs and D.W. Matula, "A 17×69 Bit Multiply and Add Unit with Redundant Binary Feedback and Single Cycle Latency," Proc. 11th Symp. Computer Arithmetic, pp. 163-170, 1993.
[43] D. Matula, "Highly Parallel Divide and Square Root Algorithms for a New Generation Floating Point Processor," extended abstract present at SCAN-89 Symp. Computer Arithmetic and Self-Validating Numerical Methods, Oct. 1989.
[44] M.D. Ercegovac, T. Lang, and P. Montuschi, “Very High Radix Division with Prescaling and Selection by Rounding,” IEEE Trans. Computers, vol. 43, no. 8, pp. 909-917, Aug. 1994.
[45] D. Das Sarma and D.W. Matula, “Measuring the Accuracy of ROM Reciprocal Tables,” IEEE Trans. Computers, vol. 43, no. 8, Aug. 1994.
[46] D. DasSarma and D. Matula, "Faithful Bipartite ROM Reciprocal Tables," Proc. 12th IEEE Symp. Computer Arithmetic, pp. 12-25, July 1995.
[47] M. Ito, N. Takagi, and S. Yajima, “Efficient Initial Approximation and Fast Converging Methods for Division and Square Root,” Proc. 12th Symp. Computer Arithmetic (ARITH12), pp. 2-9, 1995.
[48] M.J. Schulte, J. Omar, and E.E. Swartlander, "Optimal Initial Approximations for the Newton-Raphson Division Algorithm," Computing, vol. 53, pp. 233-242, 1994.
[49] E. Schwarz, "High-Radix Algorithms for High-Order Arithmetic Operations," Technical Report CSL-TR-93-559, Computer Systems Laboratory, Stanford Univ., Jan. 1993.
[50] E.M. Schwarz and M.J. Flynn,“Hardware starting approximation for the square root operation,” Proc. IEEE 11th Symp. Computer Arithmetic, pp. 103-11, 1993.
[51] P. Bannon and J. Keller, "Internal Architecture of Alpha 21164 Microprocessor," Digest of Papers COMPCON '95, pp. 79-87, Mar. 1995.
[52] T. Williams, N. Parkar, and G. Shen, "SPARC64: A 64-b 64-Active-Instruction Out-of-Order-Execution MCM Processor," IEEE J. Solid-State Circuits, vol. 30, no. 11, pp. 1,215-1,226, Nov. 1995.
[53] S.E. Richardson, “Exploiting Trivial and Redundant Computation,” Proc. 11th Symp. Computer Arithmetic, pp. 220-227, July 1993.
[54] M. Ito, N. Takagi, and S. Yajima, "Efficient Initial Approximation for Multiplicative Division and Square Root by a Multiplication with Operand Modification," IEEE Trans. Computers, vol. 46, no. 4, pp. 495-498, Apr. 1997.
[55] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for On-Chip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98-106, Feb. 1991.
[56] J. Cortadella and T. Lang, "High-Radix Division and Square Root with Speculation," IEEE Trans. Computers, vol. 43, no. 8, pp. 919-931, Aug. 1994.
[57] N. Takagi, "Generating a Power of an Operand by a Table Look-Up and a Multiplication," Proc. 13th Symp. Computer Arithmetic, pp. 126-131, July 1997.
[58] D. Eisig, J. Rostain, and I. Koren, "The Design of a 64-Bit Integer Multiplier/Divider Unit," Proc. 11th IEEE Symp. Computer Arithmetic, pp. 171-178, July 1993.

Index Terms:
Computer arithmetic, division, floating point, functional iteration, SRT, table look-up, variable latency, very high radix.
Citation:
Stuart F. Oberman, Michael J. Flynn, "Division Algorithms and Implementations," IEEE Transactions on Computers, vol. 46, no. 8, pp. 833-854, Aug. 1997, doi:10.1109/12.609274
Usage of this product signifies your acceptance of the Terms of Use.