This Article 
 Bibliographic References 
 Add to: 
Power Efficient Division and Square Root Unit
Aug. 2012 (vol. 61 no. 8)
pp. 1059-1070
Wei Liu, Politecnico di Torino, Torino
Alberto Nannarelli, Technical University of Denmark, Lyngby
Although division and square root are not frequent operations, most processors implement them in hardware to not compromise the overall performance. Two classes of algorithms implement division or square root: digit-recurrence and multiplicative (e.g., Newton-Raphson) algorithms. Previous work shows that division and square root units based on the digit-recurrence algorithm offer the best tradeoff delay-area-power. Moreover, the two operations can be combined in a single unit. Here, we present a radix-16 combined division and square root unit obtained by overlapping two radix-4 stages. The proposed unit is compared to similar solutions based on the digit-recurrence algorithm and it is compared to a unit based on the multiplicative Newton-Raphson algorithm.

[1] S. Oberman and M. Flynn, "Design Issues in Division and Other Floating-Point Operations," IEEE Trans. Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997.
[2] M. Ercegovac and T. Lang, Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publisher, 1994.
[3] M. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann Publishers, 2004.
[4] H. Baliga, N. Cooray, E. Gamsaragan, P. Smith, K. Yoon, J. Abel, and A. Valles, "Improvements in the Intel Core2 Penryn Processor Family Architecture and Microarchitecture," Intel Technology J., vol. 12, no. 3, pp. 179-192, v12i3/3-paper1-abstract.htm. Oct. 2008.
[5] N. Burgess and C.N. Hinds, "Design of the ARM Vfp11 Divide and Square Root Synthesisable Macrocell," Proc. 18th IEEE Symp. Computer Arithmetic, pp. 87-96, July 2007.
[6] G. Gerwig, H. Wetter, E.M. Schwarz, and J. Haess, "High Performance Floating-Point Unit with 116 Bit Wide Divider," Proc. 16th Symp. Computer Arithmetic, pp. 87-94, 2003.
[7] S.F. Oberman, "Floating-Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor," Proc. 14th Symp. Computer Arithmetic, pp. 106-115, 1999.
[8] "Fermi: NVIDIA's Next Generation CUDA Compute Architecture," white paper, NVIDIA, NVIDIA_Fermi_Compute_ Architecture_Whitepaper.pdf , 2009.
[9] H. Sharangpani and H. Arora, "Itanium Processor Microarchitecture," IEEE Micro, vol. 20, no. 5, pp. 24-43, Sept./Oct. 2000.
[10] J. Fandrianto, "Algorithm for High-Speed Shared Radix-8 Division and Radix-8 Square Root," Proc. Ninth Symp. Computer Arithmetic, pp. 68-75, Sept. 1989.
[11] A. Nannarelli and T. Lang, "Low-Power Radix-4 Combined Division and Square Root," Proc. Int'l Conf. Computer Design, pp. 236-242, Oct. 1999.
[12] N. Burgess, "Retiming the ARM VFP-11 Divide and Square Root Macrocell," Proc. 41st Asilomar Conf. Signals, Systems, and Computers, pp. 363-366, Nov. 2007.
[13] D. Harris, S. Oberman, and M. Horowitz, "SRT Division Architectures and Implementations," Proc. 13th Symp. Computer Arithmetic, pp. 18-25, July 1997.
[14] E. Antelo, T. Lang, P. Montuschi, and A. Nannarelli, "Digit-Recurrence Dividers with Reduced Logical Depth," IEEE Trans. Computers, vol. 54, no. 7, pp. 837-851, July 2005.
[15] A. Nannarelli and T. Lang, "Low-Power Divider," IEEE Trans. Computers, vol. 54, no. 1, pp. 2-14, Jan. 1999.
[16] N. Burgess and C. Hinds, "Design Issues in Radix-4 SRT Square Root and Divide Unit," Proc. 35th Asilomar Conf. Signals, Systems and Computers, pp. 1646-1650, 2001.
[17] S.M. Mueller et al., "The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor," Proc. 17th Symp. Computer Arithmetic, pp. 59-67, June 2005.
[18] D. DasSarma and D.W. Matula, "Measuring the Accuracy of ROM Reciprocal Tables," IEEE Trans.Computers, vol. 43, no. 8, pp. 932-940, Aug. 1994.
[19] D.A. Patterson and J.L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, fourth ed. Morgan Kaufmann Publishers, Inc., 2009.
[20] W. Liu, A. Calimera, A. Nannarelli, E. Macii, and M. Poncino, "On-Chip Thermal Modeling Based on SPICE Simulation," Proc. 19th Int'l Workshop Power and Timing Modeling, Optimization and Simulation (PATMOS '09), pp. 66-75, Sept. 2009.

Index Terms:
Floating point, division, square root, digit-recurrence.
Wei Liu, Alberto Nannarelli, "Power Efficient Division and Square Root Unit," IEEE Transactions on Computers, vol. 61, no. 8, pp. 1059-1070, Aug. 2012, doi:10.1109/TC.2012.82
Usage of this product signifies your acceptance of the Terms of Use.