This Article 
 Bibliographic References 
 Add to: 
Design Issues in Division and Other Floating-Point Operations
February 1997 (vol. 46 no. 2)
pp. 154-161

Abstract—Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floating-point division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, on-the-fly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.

[1] S.F. Anderson, J.G. Earle, R.E. Goldschmidt, and D.M. Powers, "The IBMSystem/360 Model 91: Floating-Point Execution Unit," IBM J. Research and Development, vol. 11, pp. 34-52, Jan. 1967.
[2] T. Asprey, G. Averill, E. DeLano, R. Mason, B. Weiner, and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, vol. 13, no. 3, pp. 22-35, June 1993.
[3] D.E. Atkins, "Higher-Radix Division Using Estimates of the Divisor and Partial Remainders," IEEE Trans. Computers, vol. 17, no. 10, pp. 925-934, Oct. 1968.
[4] N. Burgess and T. Williams, "Choices of Operand Truncation in the SRT Division Algorithm," IEEE Trans. Computers, vol. 44, no. 7, pp. 933-937, July 1995.
[5] G. Cybenko et al., "Supercomputing Performance Evaluation and the Perfect Benchmarks," Proc. IEEE Supercomputing '90 Conf., IEEE CS Press, Los Alamitos, Calif., 1990, pp. 254-266.
[6] M. Darley, B. Kronlage, D. Bural, B. Churchill, D. Pulling, P. Wang, R. Iwamoto, and L. Yang, "The TMS390C602A Floating-Point Coprocessor for Sparc Systems," IEEE Micro, vol. 10, no. 3, pp. 36-47, June 1990.
[7] M.D. Ercegovac and T. Lang, Division and Square Root—Digit-Recurrence Algorithms and Implementations. Kluwer Academic, 1994.
[8] M.D. Ercegovac, T. Lang, and P. Montuschi, "Very High Radix Division with Selection by Rounding and Prescaling," Proc. 11th IEEE Symp. Computer Arithmetic, pp. 112-199, July 1993.
[9] M. Flynn, "On Division by Functional Iteration," IEEE Trans. Computers, vol. 19, no. 8, pp. 702-706, Aug. 1970.
[10] S. Fu, N. Quach, and M. Flynn, "Architecture Evaluator's Work Bench and Its Application to Microprocessor Floating Point Units," Technical Report no. CSL-TR-95-668, Computer Systems Laboratory, Stanford Univ., June 1995.
[11] R.E. Goldschmidt, "Applications of Division by Convergence," MS thesis, Dept. of Electrical Eng., Massachusetts Inst. of Tech nology, June 1964.
[12] J.C. Huck and M.J. Flynn, Analyzing Computer Architectures.Washington, D.C.: IEEE CS Press, 1989.
[13] Microprocessor Report, various issues, 1994-1996.
[14] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for On-Chip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98-106, Feb. 1991.
[15] NAS Parallel Benchmarks 8/91.
[16] S. Oberman, N. Quach, and M. Flynn, "The Design and Implementation of a High-Performance Floating-Point Divider," Technical Report no. CSL-TR-94-599, Computer Systems Laboratory, Stanford Univ., Jan. 1994.
[17] S.F. Oberman and M.J. Flynn, "Measuring the Complexity of SRT Tables," Technical Report no. CSL-TR-95-679, Computer Systems Laboratory, Stanford Univ., Nov. 1995.
[18] DEC Fortran Language Reference Manual, 1992.
[19] M.D. Smith, "Tracing with Pixie," Technical Report no. CSL-TR-91-497, Computer Systems Laboratory, Stanford Univ., Nov. 1991.
[20] SPEC Benchmark Suite Release 2/92.
[21] K.G. Tan, "The Theory and Implementation of High-Radix Division," Proc. Fourth IEEE Symp. Computer Arithmetic, pp. 154-163, June 1978.
[22] G.S. Taylor, "Radix 16 SRT Dividers with Overlapped Quotient Selection Stages," Proc. Seventh IEEE Symp. Computer Arithmetic, pp. 64-71, June 1985.
[23] S. Waser and M.J. Flynn,Introduction to Arithmetic for Digital System Designers.New York: CBS College Publishing, 1982.
[24] T.E. Williams and M.A. Horowitz, "A Zero-Overhead Self-Timed 160-ns 54-b CMOS Divider," IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1,651-1,661, Nov. 1991.
[25] D. Wong and M. Flynn,“Fast division using accurate quotient approximations to reduce the number of iterations,” IEEE Trans. Computers, vol. 41, pp. 981-995, Aug. 1992.

Index Terms:
Benchmarks, computer arithmetic, division, floating-point, multiplication, square root, system performance.
Stuart F. Oberman, Michael J. Flynn, "Design Issues in Division and Other Floating-Point Operations," IEEE Transactions on Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997, doi:10.1109/12.565590
Usage of this product signifies your acceptance of the Terms of Use.