This Article 
 Bibliographic References 
 Add to: 
Floating-Point Multiply-Add-Fused with Reduced Latency
August 2004 (vol. 53 no. 8)
pp. 988-1003
Tom? Lang, IEEE Computer Society

Abstract—We propose an architecture for the computation of the double-precision floating-point multiply-add-fused (MAF) operation A + (B \times C). This architecture is based on the combined addition and rounding (using a dual adder) and in the anticipation of the normalization step before the addition. Because the normalization is performed before the addition, it is not possible to overlap the leading-zero-anticipator with the adder. Consequently, to avoid the increase in delay, we modify the design of the LZA so that the leading bits of its output are produced first and can be used to begin the normalization. Moreover, parts of the addition are also anticipated. We have estimated the delay of the resulting architecture considering the load introduced by long connections, and we estimate a delay reduction of between 15 percent and 20 percent, with respect to previous implementations.

[1] E. Antelo, M. Boo, J.D. Bruguera, and E.L. Zapata, A Novel Design for a Two Operand Normalization Circuit IEEE Trans. Very Large Scale of Integration (VLSI) Systems, vol. 6, no. 1, pp. 173-176, 1998.
[2] N. Burgess, The Flagged Prefix Adder for Dual Additions Proc. SPIE ASPAII-7, 1998.
[3] C. Chen, L.-A. Chen, and J.-R. Cheng, Architectural Design of a Fast Floating-Point Multiplication-Add Fused Unit Using Signed-Digit Addition Proc. Euromicro Symp. Digital System Design (DSD 2001), pp. 346-353, 2001.
[4] J. Cortadella, J.M. Llaberí, Evaluation of$A+B=K$Conditions without Carry Propagation IEEE Trans. Computers, vol. 41, no. 11, pp. 1484-1488, Nov. 1992.
[5] G. Even and P.-M. Seidel, A Comparison of Three Rounding Algorithms for IEEE Floating-Point Multiplication IEEE Trans. Computers, vol. 49, no. 7, pp. 638-650, July 2000.
[6] C. Heikes and G. Colon-Bonet, A Dual Floating Point Coprocessor with an FMAC Architecture Proc. IEEE Int'l Solid State Circuits Conf. (ISSCC96), pp. 354-355, 1996.
[7] E. Hokenek, R.K. Montoye, and P.W. Cook, “Second-Generation RISC Floating Point with Multiply-Add Fused,” IEEE J. Solid-State Circuits, vol. 25, no. 5, pp. 1,207-1,213, 1990.
[8] R.M. Jessani and M. Putrino, “Comparison of Single- and Double-Pass Multiply-Add Fused Floating-Point Units,” IEEE Trans. Computers, vol. 47, no. 9, pp. 927-937, Sept. 1998.
[9] S Knowles, “A Family of Adders,” Proc. 14th IEEE Symp. Computer Arithmetic, pp. 30-34, July 1999.
[10] S. Oberman, H. Al-Twaijry, and M. Flynn, The SNAP Project: Design of Floating Point Arithmetic Units Proc. 13th IEEE Symp. Computer Arithmetic, pp. 156-165, 1997.
[11] F.P. O'Connell and S.W. White, POWER3: The Next Generation of PowerPC Processors IBM J. Research and Development, vol. 44, no. 6, pp. 873-884, 2000.
[12] M.R. Santoro, G. Bewick, and M.A. Horowitz, “Rounding Algorithms for IEEE Multipliers,” Proc. Ninth Symp. Computer Arithmetic, pp. 176-183, 1989.
[13] P.-M. Seidel and G. Even, How Many Logic Levels Does Floating-Point Addition Require? Proc. 1998 Int'l Conf. Computer Design (ICCD '98): VLSI, in Computers&Processors, pp. 142-149, Oct. 1998.
[14] M. Schmookler and K. Nowka, Leading Zero Anticipation and Detection A Comparison of Methods, Proc. 15th IEEE Symp. Computer Arithmetic, pp. 7-12, 2001.
[15] H. Sharangpani and K. Arora, "Itanium Processor Microarchitecture," IEEE Micro, vol. 20, no. 5, Sept.-Oct. 2000, pp. 24-43.
[16] R.K. Yu and G.B. Zyner, “167 MHz Radix-4 Floating Point Multiplier,” Proc. 12th Symp. Computer Arithmetic, vol. 12, pp. 149-154, 1995.

Index Terms:
Computer arithmetic, floating-point functional units, multiply-add-fused (MAF) operation, VLSI design.
Tom? Lang, Javier D. Bruguera, "Floating-Point Multiply-Add-Fused with Reduced Latency," IEEE Transactions on Computers, vol. 53, no. 8, pp. 988-1003, Aug. 2004, doi:10.1109/TC.2004.44
Usage of this product signifies your acceptance of the Terms of Use.