This Article 
 Bibliographic References 
 Add to: 
Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation
November 2004 (vol. 53 no. 11)
pp. 1474-1482
We describe how using a redundant Montgomery representation allows for high-performance SIMD-based implementations of RSA and elliptic curve cryptography. This is in addition to the known benefits of immunity from timing attacks afforded by the use of such a representation. We present some preliminary implementation timings using the SSE2 instruction set on a Pentium 4 processor and show that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code. This is especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels. Finally, we remark on other application areas that improve the security of our work in the context of side-channel analysis while maintaining high performance.

[1] T. Acar, High-Speed Algorithms&Architectures for Number-Theoretic Cryptosystems PhD Thesis, Dept. of Electrical&Computer Eng., Oregon State Univ., 1997.
[2] D. Agrawal, B. Archambeault, J.R. Rao, and P. Rohatgi, The EM Side-Channel(s) Proc. Cryptographic Hardware and Embedded Systems (CHES), pp. 29-45, 2002.
[3] D. Agrawal, J.R. Rao, and P. Rohatgi, Multi-Channel Attacks Proc. Cryptographic Hardware and Embedded Systems (CHES), pp. 2-16, 2003.
[4] K. Aoki, F. Hoshino, T. Kobayashi, and H. Oguro, Elliptic Curve Arithmetic Using SIMD Proc. Information Security Conf. (ISC), pp. 235-247, 2001.
[5] R. Bhaskar, P.K. Dubey, V. Kumar, A. Rudra, and A. Sharma, Efficient Galois Arithmetic on SIMD Architectures Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 256-257, 2003.
[6] I.F. Blake, G. Seroussi, and N.P. Smart, Elliptic Curves in Cryptography. Cambridge Univ. Press, 1999.
[7] D. Boneh and D. Brumley, Remote Timing Attacks Are Feasible Proc. 12th Usenix Security Symp., 2003.
[8] C.S.K. Clapp, Instruction-Level Parallelism in AES Candidates Proc. Second AES Candidate Conf., 1999.
[9] J.-S. Coron, Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems Proc. Cryptographic Hardware and Embedded Systems (CHES), pp. 292-302, 1999.
[10] R. Crandall and J. Klivington, Vector Implementation of Multiprecision Arithmetic Apple technical report, 1999.
[11] S.R. Dussé and B.S. Kaliski, A Cryptographic Library for the Motorola DSP56000 Proc. Advances in Cryptology (EUROCRYPT), pp. 230-244, 1991.
[12] A. Fiat, Batch RSA J. Cryptology, vol. 10, pp. 75-88, 1997.
[13] D.M. Gordon, A Survey of Fast Exponentiation Methods J. Algorithms, vol. 27, pp. 129-146, 1998.
[14] S. Gueron, Enhanced Montgomery Multiplication Proc. Cryptographic Hardware and Embedded Systems (CHES), pp. 46-56, 2002.
[15] G. Hachez and J.-J. Quisquater, Montgomery Exponentiation with No Final Subtractions: Improved Results Proc. Cryptographic Hardware and Embedded Systems (CHES), pp. 293-301, 2000.
[16] Intel Corp., Using Streaming SIMD Extensions (SSE2) to Perform Big Multiplications Intel technical report, 2000.
[17] T. Izu and T. Takagi, Fast Elliptic Curve Multiplications with SIMD Operations Proc. Fourth Int'l Conf. Information and Communications Security (ICICS), pp. 217-230, 2002.
[18] C.K. Koc, T. Acar, and B. Kaliski, “Analyzing and Comparing Montgomery Multiplication Algorithms,” IEEE Micro, vol. 16, no. 3, pp. 26-33, June 1996.
[19] P. Kocher, Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS and Other Systems Proc. Advances in Cryptology (CRYPTO), pp. 104-113, 1996.
[20] P.C. Kocher, J. Jaffe, and B. Jun, Differential Power Analysis Proc. Advances in Cryptology (CRYPTO), pp. 388-397, 1999.
[21] H. Lipmaa, IDEA: A Cipher for Multimedia Architectures? Selected Areas in Cryptography (SAC), pp. 248-263, 1998.
[22] R.B. Lee and A.M. Fiskiran, PLX: A Fully Subword-Parallel Instruction-Set Architecture for Fast Scalable Multimedia Processing Proc. Third Int'l Conf. Multimedia and Expo (ICME), pp. 117-120, 2002.
[23] B.J. Lucier, Cryptography, Finite Fields, and AltiVec Dept. of Math., Purdue Univ.,http://www.altivec.orgarticles/, 2002.
[24] A.J. Menezes, P.C. van Oorschot, and S.A. Vanstone, Handbook of Applied Cryptography. CRC Press, 1997.
[25] P.L. Montgomery, Modular Multiplication without Trial Division Math. Comp., vol. 44, pp. 519-521, 1985.
[26] K. Okeya and K. Sakurai, Power Analysis Breaks Elliptic Curve Cryptosystems Even Secure against the Timing Attack Proc. Progress in Cryptology (INDOCRYPT), pp. 178-190, 2002.
[27] H. Shacham and D. Boneh, Improving SSL Handshake Performance via Batching Proc. Topics in Cryptology (CT-RSA), pp. 28-43, 2001.
[28] C.D. Walter, “Montgomery's Exponentiation Needs No Final Subtractions,” Electronics Letters, vol. 35, no. 21, pp. 1831-1832, 1999.
[29] C.D. Walter, Montgomery's Multiplication Technique: How to Make It Smaller and Faster Proc. Cryptographic Hardware and Embedded Systems (CHES), pp. 80-93, 1999.
[30] C.D. Walter and S. Thompson, Distinguishing Exponent Digits by Observing Modular Subtractions Proc. Topics in Cryptology (CT-RSA), pp. 192-207, 2001.
[31] H.S. Warren, Hacker's Delight. Addison-Wesley, 2002.

Index Terms:
Public key cryptosystems, algorithm design and analysis, parallel and vector implementations, performance measures.
Daniel Page, Nigel P. Smart, "Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation," IEEE Transactions on Computers, vol. 53, no. 11, pp. 1474-1482, Nov. 2004, doi:10.1109/TC.2004.100
Usage of this product signifies your acceptance of the Terms of Use.