Issue No. 05 - May (2012 vol. 61)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2011.77
Nong Xiao , Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Zhiying Wang , Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Li Shen , Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Sheng Ma , Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Libo Huang , Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Binary64 arithmetic is rapidly becoming inadequate to cope with today's large-scale computations due to an accumulation of errors. Therefore, binary128 arithmetic is now required to increase the accuracy and reliability of these computations. At the same time, an obvious trend emerging in modern processors is to extend their instruction sets by allowing single instruction multiple data (SIMD) execution, which can significantly accelerate the data-parallel applications. To address the combined demands mentioned above, this paper presents the architecture of a low-cost binary128 floating-point fused multiply add (FMA) unit with SIMD support. The proposed FMA design can execute a binary128 FMA every other cycle with a latency of four cycles, or two binary64 FMAs fully pipelined with a latency of three cycles, or four binary32 FMAs fully pipelined with a latency of three cycles. We use two binary64 FMA units to support binary128 FMA which requires much less hardware than a fully pipelined binary128 FMA. The presented binary128 FMA design uses both segmentation and iteration hardware vectorization methods to trade off performance, such as throughput and latency, against area and power. Compared with a standard binary128 FMA implementation, the proposed FMA design has 30 percent less area and 29 percent less dynamic power dissipation.
pipeline arithmetic, parallel processing, dynamic power dissipation, low cost binary128 floating point FMA, unit design, SIMD support, binary64 arithmetic, single instruction multiple data execution, data parallel applications, binary32 FMAs, segmentation hardware, iteration hardware, vectorization methods, Computer architecture, Adders, Hardware, Multiplexing, Program processors, Pipelines, Compounds, computer arithmetic., Floating point, binary128, fused multiply add, SIMD, implementation
Nong Xiao, Zhiying Wang, Li Shen, Sheng Ma and Libo Huang, "Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support," in IEEE Transactions on Computers, vol. 61, no. , pp. 745-751, 2012.