| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Design Issues in Division and Other Floating-Point Operations
February 1997 (vol. 46 no. 2)
pp. 154-161
Abstract—Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floating-point division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, on-the-fly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.
[1] 154 S.F. Anderson, J.G. Earle, R.E. Goldschmidt, and D.M. Powers, "The IBMSystem/360 Model 91: Floating-Point Execution Unit," IBM J. Research and Development, vol. 11, pp. 34-52, Jan. 1967.[2] T. Asprey, G. Averill, E. DeLano, R. Mason, B. Weiner, and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, vol. 13, no. 3, pp. 22-35, June 1993.[3] D.E. Atkins, "Higher-Radix Division Using Estimates of the Divisor and Partial Remainders," IEEE Trans. Computers, vol. 17, no. 10, pp. 925-934, Oct. 1968.[4] N. Burgess and T. Williams, "Choices of Operand Truncation in the SRT Division Algorithm," IEEE Trans. Computers, vol. 44, no. 7, pp. 933-937, July 1995.[5] G. Cybenko et al., "Supercomputing Performance Evaluation and the Perfect Benchmarks," Proc. IEEE Supercomputing '90 Conf., IEEE CS Press, Los Alamitos, Calif., 1990, pp. 254-266.[6] M. Darley, B. Kronlage, D. Bural, B. Churchill, D. Pulling, P. Wang, R. Iwamoto, and L. Yang, "The TMS390C602A Floating-Point Coprocessor for Sparc Systems," IEEE Micro, vol. 10, no. 3, pp. 36-47, June 1990.[7] M.D. Ercegovac and T. Lang, Division and Square Root—Digit-Recurrence Algorithms and Implementations. Kluwer Academic, 1994.[8] M.D. Ercegovac, T. Lang, and P. Montuschi, "Very High Radix Division with Selection by Rounding and Prescaling," Proc. 11th IEEE Symp. Computer Arithmetic, pp. 112-199, July 1993.[9] M. Flynn, "On Division by Functional Iteration," IEEE Trans. Computers, vol. 19, no. 8, pp. 702-706, Aug. 1970.[10] S. Fu, N. Quach, and M. Flynn, "Architecture Evaluator's Work Bench and Its Application to Microprocessor Floating Point Units," Technical Report no. CSL-TR-95-668, Computer Systems Laboratory, Stanford Univ., June 1995.[11] R.E. Goldschmidt, "Applications of Division by Convergence," MS thesis, Dept. of Electrical Eng., Massachusetts Inst. of Tech nology, June 1964.[12] J.C. Huck and M.J. Flynn, Analyzing Computer Architectures.Washington, D.C.: IEEE CS Press, 1989.[13] Microprocessor Report, various issues, 1994-1996.[14] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for On-Chip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98-106, Feb. 1991.[15] NAS Parallel Benchmarks 8/91.[16] S. Oberman, N. Quach, and M. Flynn, "The Design and Implementation of a High-Performance Floating-Point Divider," Technical Report no. CSL-TR-94-599, Computer Systems Laboratory, Stanford Univ., Jan. 1994.[17] S.F. Oberman and M.J. Flynn, "Measuring the Complexity of SRT Tables," Technical Report no. CSL-TR-95-679, Computer Systems Laboratory, Stanford Univ., Nov. 1995.[18] DEC Fortran Language Reference Manual, 1992.[19] M.D. Smith, "Tracing with Pixie," Technical Report no. CSL-TR-91-497, Computer Systems Laboratory, Stanford Univ., Nov. 1991.[20] SPEC Benchmark Suite Release 2/92.[21] K.G. Tan, "The Theory and Implementation of High-Radix Division," Proc. Fourth IEEE Symp. Computer Arithmetic, pp. 154-163, June 1978.[22] G.S. Taylor, "Radix 16 SRT Dividers with Overlapped Quotient Selection Stages," Proc. Seventh IEEE Symp. Computer Arithmetic, pp. 64-71, June 1985.[23] S. Waser and M.J. Flynn,Introduction to Arithmetic for Digital System Designers.New York: CBS College Publishing, 1982.[24] T.E. Williams and M.A. Horowitz, "A Zero-Overhead Self-Timed 160-ns 54-b CMOS Divider," IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1,651-1,661, Nov. 1991.[25] D. Wong and M. Flynn,“Fast division using accurate quotient approximations to reduce the number of iterations,” IEEE Trans. Computers, vol. 41, pp. 981-995, Aug. 1992.
Index Terms:
Benchmarks, computer arithmetic, division, floating-point, multiplication, square root, system performance.
Citation:
Stuart F. Oberman, Michael J. Flynn, "Design Issues in Division and Other Floating-Point Operations," IEEE Transactions on Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997, doi:10.1109/12.565590