The Community for Technology Leaders
RSS Icon
Issue No.02 - February (2009 vol.58)
pp: 175-187
Carl E. Lemonds , Advanced Micro Devices Inc., Austin
Dimitri Tan , Advanced Micro Devices Inc., Austin
The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there is a conflicting demand in the low-power computing market for a reduction in power consumption. Along with this, there is the absolute necessity of backward compatibility for x86-compatible microprocessors, which includes the support of x87 scientific floating-point instructions. The combined effect is that there is a need for low-power, low-cost floating-point units that are still capable of delivering good SIMD performance while maintaining full x86 functionality. This paper presents the design of an x86-compatible floating-point multiplier (FPM) that is compliant with the IEEE-754 Standard for Binary Floating-Point Arithmetic [12] and is specifically tailored to provide good SIMD performance in a low-cost, low-power solution while maintaining full x87 backward compatibility. The FPM efficiently supports multiple precisions using an iterative rectangular multiplier. The FPM can perform two parallel single-precision multiplies every cycle with a latency of two cycles, one double-precision multiply every two cycles with a latency of four cycles, or one extended-double-precision multiply every three cycles with a latency of five cycles. The iterative FPM also supports division, square-root, and transcendental functions. Compared to a previous design with similar functionality, the proposed iterative FPM has 60 percent less area and 59 percent less dynamic power dissipation.
Computer arithmetic, rectangular multiplier, floating-point arithmetic, low-power, multiplying circuits, multimedia, very-large-scale integration.
Carl E. Lemonds, Dimitri Tan, "Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support", IEEE Transactions on Computers, vol.58, no. 2, pp. 175-187, February 2009, doi:10.1109/TC.2008.203
[1] P. Ranganathan, S. Adve, and N. Jouppi, “Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA '99), vol. 27, pp. 124-135, May 1999.
[2] S.K. Raman, V. Pentkovski, and J. Keshava, “Implementing Streaming SIMD Extensions on the Pentium III Processor,” IEEE Micro, vol. 20, pp. 47-57, July 2000.
[3] M.-L. Li, R. Sasanka, S. Adve, Y.-K. Chen, and E. Debes, “The ALPBench Benchmark Suite for Complex Multimedia Applications,” Proc. IEEE Int'l Symp. Workload Characterization (IISWC '05), pp. 34-45, Oct. 2005.
[4] H. Nguyen and L.K. John, “Exploiting SIMD Parallelism in DSP and Multimedia Algorithms Using the AltiVec Technology,” Proc. 13th Int'l Conf. Supercomputing (ICS '99), pp. 11-20, June 1999.
[5] “Advanced Micro Devices,” AMD64 Architecture Programmer's Manual Volume 4: 128-Bit Media Instructions, rev. 3.07 ed., Dec. 2005.
[6] “Advanced Micro Devices,” AMD64 Architecture Programmer's Manual Volume 5: 64-Bit Media and x87 Floating-Point Instructions, rev. 3.06 ed., Dec. 2005.
[7] J. Hennessy and D. Patterson, Computer Architecture: A QuantitativeApproach, ch. 2, third ed. Morgan Kaufmann, p.119, May 2002.
[8] S. Oberman, “Floating-Point Division and Square Root Algorithms and Implementation in the AMD-K7™ Microprocessor,” Proc. 14th IEEE Symp. Computer Arithmetic (ARITH '99), pp.106-115, Apr. 1999.
[9] C. Keltcher, K. McGrath, A. Ahmed, and P. Conway, “The AMDOpteron Processor for Multiprocessor Servers,” IEEE Micro, vol. 23, pp. 66-76, Mar. 2003.
[10] W. Briggs and D. Matula, “A 17 $\times$ 69 Bit Multiply and Add Unit with Redundant Binary Feedback and Single Cycle Latency,” Proc. 11th IEEE Symp. Computer Arithmetic (ARITH'93), pp. 163-170, July 1993.
[11] M. Schulte, C. Lemonds, and D. Tan, “Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier,” Proc. IEEE Int'l Conf. Computer Design (ICCD '07), pp. 304-310, Oct. 2007.
[12] ANSI and IEEE, IEEE-754 Standard for Binary Floating-Point Arithmetic, 1985.
[13] G. Hinton, M. Upton, D. Sager, D. Boggs, D. Carmean, P. Roussel, T. Chappell, T. Fletcher, M. Milshtein, M. Sprague, S. Samaan, and R. Murray, “A 0.18-um CMOS IA-32 Processor with a 4-GHz Integer Execution Unit,” IEEE J. Solid-State Circuits, vol. 36, pp.1617-1627, Nov. 2001.
[14] G. Even, S.M. Mueller, and P.-M. Seidel, “A Dual Mode IEEE Multiplier,” Proc. Second Ann. IEEE Int'l Conf. Innovative Systems in Silicon (ISIS '97), pp. 282-289, Oct. 1997.
[15] S. Vassiliadis, E. Schwarz, and B. Sung, “Hard-Wired Multipliers with Encoded Partial Products,” IEEE Trans. Computers, vol. 40, pp. 1181-1197, Nov. 1991.
[16] A. Weinberger, “4:2 Carry-Save Adder Module,” IBM Technical Disclosure Bull., vol. 23, pp. 3811-3814, Jan. 1981.
[17] S. Anderson, J. Earle, R. Goldschmidt, and D. Powers, “The IBM System/360 Model 91: Floating-Point Execution Unit,” IBM J. Research and Development, vol. 11, pp. 34-53, Jan. 1967.
[18] R.M. Jessani and M. Putrino, “Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units,” IEEE Trans. Computers, vol. 47, pp. 927-937, Sept. 1998.
[19] M.R. Santoro, G. Bewick, and M. Horowitz, “Rounding Algorithms for IEEE Multipliers,” Proc. Ninth IEEE Symp. Computer Arithmetic (ARITH '89), pp. 176-183, Sept. 1989.
[20] G. Even and P.-M. Seidel, “A Comparison of Three Rounding Algorithms for IEEE Floating-Point Multiplication,” IEEE Trans. Computers, vol. 49, pp. 638-650, July 2000.
[21] N.T. Quach, N. Takagi, and M. Flynn, “Systematic IEEE Rounding Method for High-Speed Floating-Point Multipliers,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 12, pp. 511-521, May 2004.
[22] A. Enriques and K. Jones, “Design of a Multi-Mode Pipelined Multiplier for Floating-Point Applications,” Proc. IEEE Nat'l Aerospace and Electronics Conf. (NAECON '91), vol. 1, pp. 77-81, May 1991.
[23] A. Akkas and M. Schulte, “A Quadruple Precision and Dual Double Precision Floating-Point Multiplier,” Proc. Euromicro Symp. Digital System Design (DSD '03), pp. 76-81, Sept. 2003.
[24] D. Tan, A. Danysh, and M. Liebelt, “Multiple-Precision Fixed-Point Vector Multiply-Accumulator Using Shared Segmentation,” Proc. 16th IEEE Symp. Computer Arithmetic (ARITH '03), pp. 12-19, June 2003.
[25] S. Krithivasan and M.J. Schulte, “Multiplier Architectures for Media Processing,” Proc. IEEE 37th Asilomar Conf. Signals, Systems, and Computers (ACSSC '03), vol. 2, pp. 2193-2197, Nov. 2003.
[26] L. Huang, L. Shen, K. Dai, and Z. Wang, “A New Architecture forMultiple-Precision Floating-Point Multiply-Add Fused Unit Design,” Proc. 18th IEEE Symp. Computer Arithmetic (ARITH '07), pp. 69-76, June 2007.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool