1 / 14

Fast Modular Reduction

Fast Modular Reduction. Will Hasenplaugh Gunnar Gaubatz Vinodh Gopal June 27, 2007. Modular Multiplication. Modular Multiplication is used in Public Key Cryptography Diffie-Hellman and RSA Prime-field Elliptic Curve Cryptography

Download Presentation

Fast Modular Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Modular Reduction Will Hasenplaugh Gunnar Gaubatz Vinodh Gopal June 27, 2007

  2. Modular Multiplication • Modular Multiplication is used in Public Key Cryptography • Diffie-Hellman and RSA • Prime-field Elliptic Curve Cryptography • Compute AB mod M where A,B and M are typically 100’s to 1000’s of bits • We present a variant of Barrett’s Modular Reduction Algorithm which exploits Karatsuba Multiplication and Modular Folding • Analysis is software focused • We use an abstract processor to compare algorithms fairly • The native word size is w-bits (a power of 2) • 1-cycle add and an m-cycle multiply • We present example data on an 8-bit processor with a 2-cycle multiplier • Atmel AVR series - representative of embedded handheld devices • Our algorithm is also applicable to hardware acceleration Digital Enterprise Group

  3. Word-Serial Montgomery Pro: Regularity Interleaved Multiply and Reduce Low-Complexity Quotient Estimation Right-to-Left computation leads to convenient hardware pipelines Con: Transformation Overhead n2 complexity Barrett Pro: No Transformation Overhead Large Digit Based Computation Allows sub-n2 multiplication techniques Flexible ‘Off the Shelf’ hardware Con: Quotient Estimation requires a ‘large digit’ multiplication Left-to-Right computation is less convenient for hardware Montgomery vs. Barrett Digital Enterprise Group

  4. Barrett vs. Montgomery • Performance of n2 Barrett approaches ~2/3 of Montgomery • Quotient Estimation for Montgomery is amortized as operands grow Digital Enterprise Group

  5. Karatsuba Multiplication • Recursive multiplication algorithm with O( n1.585 ) complexity. • ‘Schoolbook’ multiplication complexity scales as O( n2 ), but requires fewer additions per recursion. • N=AB • A=a12n+a0 • B=b12n+b0 • Schoolbook Multiplication - • N=a1b122n+(a1b0+a0b1)2n+a0b0 • Karatsuba Multiplication - • N=a1b122n+ • [(a1+a0)(b1+b0)-a1b1-a0b0]2n+a0b0 a1 a0 b1 b0 A B x a1+a0 b1+b0 a1b1 a0b0 + (a1+a0)(b1+b0) - a0b0 - a1b1 N=AB Digital Enterprise Group

  6. Recursive Karatsuba Decomposition a1 A a0 <= 1 <= 2 For k recursions: ‘extra’ word is <= log2k bits <= 3 a1+a0 There are fewer particles in the universe than that. Just one extra word on an 8-bit machine is sufficient to handle multiplication of numbers up to 2^258 bits. So, we probably won’t need to rewrite this code. Digital Enterprise Group

  7. Carry Handling • There is considerable overhead in the naïve implementation of Karatsuba. • At a recursion depth of 4, ~20% of the multiplies are with sparsely populated ‘extra’ words. We turn sparsely populated multiplies into branches and adds. N=AB A=ah2n+al B=bh2n+bl ahand bhare booleans N=ahbh22n+[ahbl+bhal]2n+albl ah al bh bl x albl + if =1 al bh ah + if =1 bl + if & =1 1 ah bh N Each recursion is a conveniently-sized multiply -> No ‘extra’ words. Digital Enterprise Group

  8. Karatsuba vs. Schoolbook Multiplication Digital Enterprise Group

  9. Barrett’s Algorithm • A, B and M are n-bit numbers. We seek to find R = AB mod M using Barrett’s Algorithm. • A total of 3 n-bit multiplies. A B x N / 2n N N mod 2n μ x μN / 2n ~μN / 22n M x ~μNM / 22n - R Digital Enterprise Group

  10. Barrett vs. Montgomery Digital Enterprise Group

  11. Folding • We accelerate the reduction process by partially reducing N ( =AB ) with an inexpensive method called Folding: A B x N / 23s N N mod 23s M’=23s mod M x ~NM’ / 23s + N’ Digital Enterprise Group

  12. We can play the same trick again. F times, in fact. Iterative Folding N / 21.5n N N mod 21.5n M(1) x + N(1) N(1) mod 21.25n M(2) x + N(2) N(2) mod 21.125n Digital Enterprise Group

  13. Iterative Folding ( F = 2 ) Digital Enterprise Group

  14. Summary • This Fast Modular Reduction technique is ~2x faster than Montgomery on RSA Encryption on 512 – 1024 bit keys. • As security requirements heighten, key sizes will grow to meet them and the asymptotic advantage of Karatsuba will continue to shine. We see a ~3x and ~4x advantage, respectively, for 2048 and 4096 bit keys. • The speedup of a multiplier-bound, w-bit architecture is • Strong encryption on low-power handheld devices is challenging • Ex: A 16MHz 8-bit Atmel AVR computes a 4096-bit RSA in almost 4 minutes with Montgomery, but we can do it in 1. Digital Enterprise Group

More Related