The Community for Technology Leaders
Green Image
Issue No. 12 - December (2011 vol. 60)
ISSN: 0018-9340
pp: 1692-1703
Patrick Schaumont , Virginia Polytechnic Institute and State University, Blacksburg
Zhimin Chen , Virginia Polytechnic Institute and State University, Blacksburg
ABSTRACT
The Montgomery Multiplication is one of the cornerstones of public-key cryptography, with important applications in the RSA algorithm, in Elliptic-Curve Cryptography, and in the Digital Signature Standard. The efficient implementation of this long-word-length modular multiplication is crucial for the performance of public-key cryptography. Along with the strong momentum of shifting from single-core to multicore systems, we present a parallel-software implementation of the Montgomery multiplication for multicore systems. Our comprehensive analysis shows that the proposed scheme, pSHS, partitions the task in a balanced way so that each core has the same amount of job to do. In addition, we also comprehensively analyze the impact of intercore communication overhead on the performance of pSHS. The analysis reveals that pSHS is high performance, scalable over different number of cores, and stable when the communication latency changes. The analysis also tells us how to set different parameters to achieve the optimal performance. We implemented pSHS on a prototype multicore architecture configured in a Field Programmable Gate Array (FPGA). Compared with the sequential implementation, pSHS accelerates 2,048-bit Montgomery multiplication by 1.97, 3.68, and 6.13 times on, respectively, two-core, four-core, and eight-core architectures with communication latency equal to 100 clock cycles.
INDEX TERMS
Montgomery multiplication, public-key cryptography, parallel programming, tiled processor.
CITATION
Patrick Schaumont, Zhimin Chen, "A Parallel Implementation of Montgomery Multiplication on Multicore Systems: Algorithm, Analysis, and Prototype", IEEE Transactions on Computers, vol. 60, no. , pp. 1692-1703, December 2011, doi:10.1109/TC.2010.256
100 ms
(Ver 3.3 (11022016))