
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Stuart F. Oberman, Michael J. Flynn, "Division Algorithms and Implementations," IEEE Transactions on Computers, vol. 46, no. 8, pp. 833854, August, 1997.  
BibTex  x  
@article{ 10.1109/12.609274, author = {Stuart F. Oberman and Michael J. Flynn}, title = {Division Algorithms and Implementations}, journal ={IEEE Transactions on Computers}, volume = {46}, number = {8}, issn = {00189340}, year = {1997}, pages = {833854}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.609274}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  Division Algorithms and Implementations IS  8 SN  00189340 SP833 EP854 EPD  833854 A1  Stuart F. Oberman, A1  Michael J. Flynn, PY  1997 KW  Computer arithmetic KW  division KW  floating point KW  functional iteration KW  SRT KW  table lookup KW  variable latency KW  very high radix. VL  46 JA  IEEE Transactions on Computers ER   
Abstract—Many algorithms have been developed for implementing division in hardware. These algorithms differ in many aspects, including quotient convergence rate, fundamental hardware primitives, and mathematical formulations. This paper presents a taxonomy of division algorithms which classifies the algorithms based upon their hardware implementations and impact on system design. Division algorithms can be divided into five classes: digit recurrence, functional iteration, very high radix, table lookup, and variable latency. Many practical division algorithms are hybrids of several of these classes. These algorithms are explained and compared in this work. It is found that, for lowcost implementations where chip area must be minimized, digit recurrence algorithms are suitable. An implementation of division by functional iteration can provide the lowest latency for typical multiplier latencies. Variable latency algorithms show promise for simultaneously minimizing average latency while also minimizing area.
[1] SPEC benchmark suite release 2/92.
[2] Microprocessor Report, various issues, 19941996.
[3] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other Floating Point Operations,” IEEE Trans. Computers, vol. 46, no. 2, pp. 154161, Feb. 1997.
[4] C.V. Freiman, "Statistical Analysis of Certain Binary Division Algorithms," IRE Proc., vol. 49, pp. 91103, 1961.
[5] J.E. Robertson, "A New Class of Digital Division Methods," IRE Trans. Electronic Computers, vol. 7, pp. 218222, Sept. 1958.
[6] K.D. Tocher, "Techniques of Multiplication and Division for Automatic Binary Computers," Quarterly J. Mech. Appl. Math., vol. 11, pt. 3, pp. 364384, 1958.
[7] D.E. Atkins, "HigherRadix Division Using Estimates of the Divisor and Partial Remainders," IEEE Trans. Computers, vol. 17, no. 10, Oct. 1968.
[8] K.G. Tan, "The Theory and Implementation of HighRadix Division," Proc. Fourth IEEE Symp. Computer Arithmetic, pp. 154163, June 1978.
[9] M.D. Ercegovac and T. Lang, Division and Square Root—DigitRecurrence Algorithms and Implementations. Kluwer Academic, 1994.
[10] M. Flynn, "On Division by Functional Iteration," IEEE Trans. Computers, vol. 19, no. 8, Aug. 1970.
[11] P. Soderquist and M. Leeser, "An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations," Proc. 12th IEEE Symp. Computer Arithmetic, IEEE, 1995, pp. 132139.
[12] ANSI/IEEE Std. 7541985, Binary FloatingPoint Arithmetic, IEEE Press, Piscataway, N.J., 1985 (also called ISO/IEC 559).
[13] S. Oberman, "Design Issues in High Performance Floating Point Arithmetic Units," PhD thesis, Stanford Univ., Nov. 1996.
[14] M.D. Ercegovac and T. Lang,“Simple radix4 division with operands scaling,” IEEE Trans. Computers, vol. 39, no. 9, pp. 1,2041,207, Sept. 1990.
[15] J. Fandrianto,“Algorithm for highspeed shared radix8 division and radix8 square root,” Proc. Ninth IEEE Computer Arithmetic Symp., pp. 6875, 1989.
[16] S.E. McQuillan, J.V. McCanny, and R. Hamill, "New Algorithms and VLSI Architectures for SRT Division and Square Root," Proc. 11th Symp. Computer Arithmetic, pp. 8086,Windsor, Ontario, Canada, 29 June July2 1993.
[17] P. Montuschi and L. Ciminiera, "Reducing Iteration Time When Result Digit Is Zero for Radix 2 SRT Division and Square Root with Redundant Remainders," IEEE Trans. Computers, vol. 42, no. 2, pp. 239246, Feb. 1993.
[18] P. Montuschi and L. Ciminiera, "OverRedundant Digit Sets and the Design of DigitbyDigit Division Units," IEEE Trans. Computers, vol. 43, no. 3, pp. 269279, Mar. 1994.
[19] P. Montuschi and L. Ciminiera, "Radix8 Division with OverRedundant Digit Set," J. VLSI Signal Processing, vol. 7, no. 3, pp. 259270, May 1994.
[20] D. Harris, S. Oberman, and M. Horowitz, “SRT Division Architectures and Implementations,” Proc. IEEE 13th Int'l Symp. Computer Arithmetic (ARITH13), pp. 1825, 1997.
[21] N. Quach and M. Flynn, "A Radix64 FloatingPoint Divider," Technical Report CSLTR92529, Computer Systems Laboratory, Stanford Univ., June 1992.
[22] H.R. Srinivas and K.K. Parhi, "A Fast Radix4 Division Algorithm and Its Architecture," IEEE Trans. Computers, vol. 44, no. 6, pp. 826831, June 1995.
[23] G.S. Taylor, "Radix 16 SRT Dividers with Overlapped Quotient Selection Stages," Proc. Seventh IEEE Symp. Computer Arithmetic, pp. 6471, June 1985.
[24] T.E. Williams and M.A. Horowitz, "A ZeroOverhead SelfTimed 160ns 54b CMOS Divider," IEEE J. SolidState Circuits, vol. 26, no. 11, pp. 1,6511,661, Nov. 1991.
[25] T. Asprey, G. Averill, E. DeLano, R. Mason, B. Weiner, and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, vol. 13, no. 3, pp. 2235, June 1993.
[26] D. Hunt, “Advanced Performance Features of the 64bit PA8000,” Proc. COMPCON, pp. 123128, 1995.
[27] T. Lynch, S. McIntyre, K. Tseng, S. Shaw, and T. Hurson, "High Speed Divider with Square Root Capability," U.S. Patent No. 5,128,891, 1992.
[28] J.A. Prabhu and G.B. Zyner, "167 MHz Radix8 Divide and Squareroot Using Overlapped Radix2 Stages," Proc. 12th Symp. Computer Arithmetic, IEEE CS Press, 1995, pp. 155162.
[29] A. Svoboda, "An Algorithm for Division," Information Processing Machines, vol. 9, pp. 2934, 1963.
[30] M.D. Ercegovac and T. Lang,"OntheFly Conversion of Redundant into Conventional Representations," IEEE Trans. Computers, vol. 36, pp. 895897, 1987.
[31] M.D. Ercegovac and T. Lang, "OntheFly Rounding," IEEE Trans. Computers, vol. 41, no. 12, pp. 1,4971,503, Dec. 1992.
[32] S.F. Anderson, J.G. Earle, R.E. Goldschmidt, and D.M. Powers, "The IBM System/360 Model 91: FloatingPoint Execution Unit," IBM J. Research and Development, vol. 11, pp. 3453, Jan. 1967.
[33] D.L. Fowler and J.E. Smith, "An Accurate, High Speed Implementation of Division by Reciprocal Approximation," Proc. Ninth IEEE Symp. Computer Arithmetic, pp. 6067, Sept. 1989.
[34] R.E. Goldschmidt, "Applications of Division by Convergence," MS thesis, Dept. of Electrical Eng., Massachusetts Inst. of Tech nology, Cambridge, Mass., June 1964.
[35] Intel, i860 64bit Microprocessor Programmer's Reference Manual, 1989.
[36] P.W. Markstein, “Computation of Elementary Functions on the IBM RISC System/6000 Processor,” IBM J. Research and Development, vol. 34, no. 1, pp. 111119, Jan. 1990.
[37] H. Darley, M. Gill, D. Earl, D. Ngo, P. Wang, M. Hipona, and J. Dodrill, "Floating Point/Integer Processor with Divide and Square Root Functions," U.S. Patent No. 4,878,190, 1989.
[38] E. Schwarz, "Rounding for Quadratically Converging Algorithms for Division and Square Root," Proc. 29th Asilomar Conf. Signals, Systems, and Computers, pp. 600603, Oct. 1995.
[39] D. DasSarma and D. Matula, "Faithful Interpolation in Reciprocal Tables," Proc. 13th IEEE Symp. Computer Arithmetic, July 1997.
[40] H. Kabuo, T. Taniguchi, A. Miyoshi, H. Yamashita, M. Urano, H. Edamatsu, and S. Kuninobu, "Accurate Rounding Scheme for the NewtonRaphson Method Using Redundant Binary Representation," IEEE Trans. Computers, vol. 43, no. 1, pp. 4351, Jan. 1994.
[41] D. Wong and M. Flynn,“Fast division using accurate quotient approximations to reduce the number of iterations,” IEEE Trans. Computers, vol. 41, pp. 981995, Aug. 1992.
[42] W.S. Briggs and D.W. Matula, "A 17×69 Bit Multiply and Add Unit with Redundant Binary Feedback and Single Cycle Latency," Proc. 11th Symp. Computer Arithmetic, pp. 163170, 1993.
[43] D. Matula, "Highly Parallel Divide and Square Root Algorithms for a New Generation Floating Point Processor," extended abstract present at SCAN89 Symp. Computer Arithmetic and SelfValidating Numerical Methods, Oct. 1989.
[44] M.D. Ercegovac, T. Lang, and P. Montuschi, “Very High Radix Division with Prescaling and Selection by Rounding,” IEEE Trans. Computers, vol. 43, no. 8, pp. 909917, Aug. 1994.
[45] D. Das Sarma and D.W. Matula, “Measuring the Accuracy of ROM Reciprocal Tables,” IEEE Trans. Computers, vol. 43, no. 8, Aug. 1994.
[46] D. DasSarma and D. Matula, "Faithful Bipartite ROM Reciprocal Tables," Proc. 12th IEEE Symp. Computer Arithmetic, pp. 1225, July 1995.
[47] M. Ito, N. Takagi, and S. Yajima, “Efficient Initial Approximation and Fast Converging Methods for Division and Square Root,” Proc. 12th Symp. Computer Arithmetic (ARITH12), pp. 29, 1995.
[48] M.J. Schulte, J. Omar, and E.E. Swartlander, "Optimal Initial Approximations for the NewtonRaphson Division Algorithm," Computing, vol. 53, pp. 233242, 1994.
[49] E. Schwarz, "HighRadix Algorithms for HighOrder Arithmetic Operations," Technical Report CSLTR93559, Computer Systems Laboratory, Stanford Univ., Jan. 1993.
[50] E.M. Schwarz and M.J. Flynn,“Hardware starting approximation for the square root operation,” Proc. IEEE 11th Symp. Computer Arithmetic, pp. 10311, 1993.
[51] P. Bannon and J. Keller, "Internal Architecture of Alpha 21164 Microprocessor," Digest of Papers COMPCON '95, pp. 7987, Mar. 1995.
[52] T. Williams, N. Parkar, and G. Shen, "SPARC64: A 64b 64ActiveInstruction OutofOrderExecution MCM Processor," IEEE J. SolidState Circuits, vol. 30, no. 11, pp. 1,2151,226, Nov. 1995.
[53] S.E. Richardson, “Exploiting Trivial and Redundant Computation,” Proc. 11th Symp. Computer Arithmetic, pp. 220227, July 1993.
[54] M. Ito, N. Takagi, and S. Yajima, "Efficient Initial Approximation for Multiplicative Division and Square Root by a Multiplication with Operand Modification," IEEE Trans. Computers, vol. 46, no. 4, pp. 495498, Apr. 1997.
[55] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for OnChip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98106, Feb. 1991.
[56] J. Cortadella and T. Lang, "HighRadix Division and Square Root with Speculation," IEEE Trans. Computers, vol. 43, no. 8, pp. 919931, Aug. 1994.
[57] N. Takagi, "Generating a Power of an Operand by a Table LookUp and a Multiplication," Proc. 13th Symp. Computer Arithmetic, pp. 126131, July 1997.
[58] D. Eisig, J. Rostain, and I. Koren, "The Design of a 64Bit Integer Multiplier/Divider Unit," Proc. 11th IEEE Symp. Computer Arithmetic, pp. 171178, July 1993.