The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2009 vol.58)
pp: 1539-1552
Ghassem Jaberipur , Shahid Beheshti University and Institute for Research in Fundamental Sciences, Tehran
Amir Kaivani , Shahid Beheshti University, Tehran
ABSTRACT
Hardware support for decimal computer arithmetic is regaining popularity. One reason is the recent growth of decimal computations in commercial, scientific, financial, and Internet-based computer applications. Newly commercialized decimal arithmetic hardware units use radix-10 sequential multipliers that are rather slow for multiplication-intensive applications. Therefore, the future relevant processors are likely to host fast parallel decimal multiplication circuits. The corresponding hardware algorithms are normally composed of three steps: partial product generation (PPG), partial product reduction (PPR), and final carry-propagating addition. The state of the art is represented by two recent full solutions with alternative designs for all the three aforementioned steps. In addition, PPR by itself has been the focus of other recent studies. In this paper, we examine both of the full solutions and the impact of a PPR-only design on the appropriate one. In order to improve the speed of parallel decimal multiplication, we present a new PPG method, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution. Logical Effort analysis and 0.13 \mu{\rm m} synthesis show at least 13 percent speed advantage, but at a cost of at most 36 percent additional area consumption.
INDEX TERMS
Decimal computer arithmetic, parallel decimal multiplication, partial product generation and reduction, logic design.
CITATION
Ghassem Jaberipur, Amir Kaivani, "Improving the Speed of Parallel Decimal Multiplication", IEEE Transactions on Computers, vol.58, no. 11, pp. 1539-1552, November 2009, doi:10.1109/TC.2009.110
REFERENCES
[1] M.F. Cowlishaw, “Decimal Floating-Point: Algorism for Computers,” Proc. 16th IEEE Symp. Computer Arithmetic, pp. 104-111, June 2003.
[2] F.Y. Busaba, C.A. Krygowski, W.H. Li, E.M. Schwarz, and S.R. Carlough, “The IBM z900 Decimal Arithmetic Unit,” Proc. Asilomar Conf. Signals, Systems, Computers, vol. 2, pp. 1335-1339, Nov. 2001.
[3] S. Shankland, “IBM's POWER6 Gets Help with Math, Multimedia,” ZDNet News, Oct. 2006.
[4] C.F. Webb, “IBM z10: The Next-Generation Mainframe Microprocessor,” IEEE Micro, vol. 28, no. 2, pp. 19-29, Mar./Apr. 2008.
[5] IEEE Standards Committee, 754-2008 IEEE Standard for Floating-Point Arithmetic, (http://ieeexplore.IEEE.org/servlet opac?punumber=4610933 ), pp. 1-58, Aug. 2008, DOI: 10.1109/IEEESTD.2008.4610935.
[6] M. Schmookler and A. Weinberger, “High Speed Decimal Addition,” IEEE Trans. Computers, vol. 20, no. 8, pp. 862-866, Aug. 1971.
[7] J. Thompson, K. Nandini, and M.J. Schulte, “A 64-Bit Decimal Floating-Point Adder,” Proc. IEEE Computer Soc. Ann. Symp. VLSI Emerging Trends VLSI Systems Design (ISVLSI '04), pp. 197-198, Feb. 2004.
[8] A. Vazquez and E. Antelo, “Conditional Speculative Decimal Addition,” Proc. Seventh Conf. Real Numbers Computers (RNC 7), pp. 47-57, July 2006.
[9] R.D. Kenney and M.J. Schulte, “High-Speed Multioperand Decimal Adders,” IEEE Trans. Computers, vol. 54, no. 8, pp. 953-963, Aug. 2005.
[10] L. Dadda, “Multi Operand Parallel Decimal Adder: A Mixed Binary and BCD Approach,” IEEE Trans. Computers, vol. 56, no. 10, pp. 1320-1328, Oct. 2007.
[11] M.A. Erle and M.J. Schulte, “Decimal Multiplication via Carry-Save Addition,” Proc. Conf. Application-Specific Systems, Architectures, Processors, pp. 348-358, June 2003.
[12] R.D. Kenney, M.J. Schulte, and M.A. Erle, “A High-Frequency Decimal Multiplier,” Proc. IEEE Int'l. Conf. Computer Design: VLSI Computers Processors (ICCD), pp. 26-29, Oct. 2004.
[13] M.A. Erle, E.M. Schwartz, and M.J. Schulte, “Decimal Multiplication with Efficient Partial Product Generation,” Proc. 17th IEEE Symp. Computer Arithmetic, pp. 21-28, June 2005.
[14] W. Liang-Kai and M.J. Schulte, “Decimal Floating-Point Division Using Newton-Raphson Iteration,” Proc. 15th Int'l. Conf. Application-Specific Systems, Architectures Processors, pp. 84-95, 2004.
[15] H. Nikmehr, B. Phillips, and C.C. Lim, “Fast Decimal Floating-Point Division,” IEEE Trans. VLSI Systems, vol. 14, no. 9, pp. 951-961, Sept. 2006.
[16] T. Lang and A. Nannarelli, “A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture,” IEEE Trans. Computers, vol. 56, no. 6, pp. 727-739, June 2007.
[17] L. Wang and M.J. Schulte, “A Decimal Floating-Point Divider Using Newton-Raphson Iteration,” J. VLSI Signal Processing Systems, vol. 14, no. 1, pp. 3-18, Oct. 2007.
[18] T. Lang and A. Nannarelli, “A Radix-10 Combinational Multiplier,” Proc. Asilomar Conf. Signals, Systems, Computers, pp. 313-317, Nov. 2006.
[19] I.D. Castellanos and J.E. Stine, “Compressor Trees for Decimal Partial Product Reduction,” Proc. 18th ACM Great Lakes Symp. VLSI, pp. 107-110, May 2008.
[20] A. Vazquez, E. Antelo, and P. Montuschi, “A New Family of High-Performance Parallel Decimal Multipliers,” Proc. 18th IEEE Symp. Computer Arithmetic, pp. 195-204, June 2007.
[21] I.E. Sutherland, R.F. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits. Morgan Kaufmann, 1999.
[22] G. Jaberipur and A. Kaivani, “Binary-Coded Decimal Digit Multipliers,” IET Computers & Digital Techniques, vol. 1, no. 4, pp. 377-381, July 2007.
[23] R.K. Richards, Arithmetic Operations in Digital Computers. Van Nostrand, 1955.
[24] R.H. Larson, “High Speed Multiply Using Four Input Carry Save Adder,” IBM Technical Disclosure Bull., vol. 16, no. 7, pp. 2053-2054, Dec. 1973.
[25] T. Ueda, “Decimal Multiplying Assembly and Multiply Module,” US Patent 5379245, Jan. 1995.
[26] C.S. Wallace, “A Suggestion for Fast Multiplier,” IEEE Trans. Electronic Computers, vol. 13, no. 2, pp. 14-17, Feb. 1964.
[27] S.K. Mathew, M. Anders, R.K. Krishnamurthy, and S. Borkar, “A 4Ghz 130 nm Address Generation Unit with 32-Bit Sparse-Tree Adder Core,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 689-695, May 2003.
[28] P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,” IEEE Trans. Computers, vol. 22, no. 8, pp.786-793, Aug. 1973.
[29] B. Hickmann, A. Krioukov, M. Schulte, and M. Erle, “A Parallel IEEE P754 Decimal Floating-Point Multiplier,” Proc. 25th Int'l. Conf. Computer Design (ICCD '07), pp. 296-303, Oct. 2007.
[30] C. Grecu, P.P. Pande, A. Ivanov, and R. Saleh, “Timing Analysis of Network on Chip Architectures for MP-SoC Platforms,” Microelectronics J., vol. 36, no. 9, pp. 833-845, Sept. 2005.
[31] J.D. Nicoud, “Iterative Arrays for Radix Conversion,” IEEE Trans. Computers, vol. 20, no. 12, pp. 1479-1489, Dec. 1971.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool