1 / 40

Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer

Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer. Nghi Nguyen 1 , Kris Gaj 1 , David Caliga 2 , Tarek El-Ghazawi 3. 1 George Mason University 2 SRC Computers 3 The George Washington University. What is a reconfigurable computer?.

romney
Download Presentation

Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer Nghi Nguyen1, Kris Gaj1, David Caliga2, Tarek El-Ghazawi3 1 George Mason University2 SRC Computers3 The George Washington University 1

  2. What is a reconfigurable computer? Reconfigurable processor system Microprocessor system . . . P P . . . FPGA FPGA P memory P memory FPGA memory FPGA memory . . . . . . Interface Interface I/O I/O 2

  3. Characteristic Features • close integration of the microprocessor system and the FPGA system • integrated programming environment • programming does not require hardware expertise • suitable for a wide range of applications • permits run-time reconfiguration of the FPGA system 3

  4. SRC Hardware & Software 4

  5. SRC Hardware Architecture 5

  6. SRC vs. FPGA Accelerator Boards Programming Graphical Data Flow Diagram HDL HLL Software FPGA Boards Hardware Software SRC Hardware 6

  7. SRC Compilation Process 7

  8. Run Time Reconfiguration in SRC Program in C or Fortran FPGA contents after the Function_1 call Main program Function_1 a …… FPGA Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e) Function_1(a, d, e) Macro_1 …… c b Function_2 Macro_2 Macro_2 Macro_3(s, t) Macro_1(n, b) Macro_4(t, k) Function_2(d, e, f) d e …… 8

  9. Elliptic Curve Cryptosystems 9

  10. Elliptic Curve Cryptosystems • public key (asymmetric) cryptosystems • first true alternative for RSA • several times shorter keys • fast and compact implementations, in particular in hardware • a family of cryptosystems, instead of a single cryptosystem 10

  11. Three Classes of Elliptic Curves Elliptic curves built over Secure m m=155 .. 512 K = GF(2m) K = GF(p) Our m m=233 Arithmetic operations present in many libraries Normal basis representation Polynomial basis representation Fast in hardware Compact in hardware 11

  12. ECC Hierarchy High-level functions kP Medium-level functions 2P P+Q Low-level functions MUL INV XOR 12

  13. Basic operations of Elliptic Curve Cryptosystems (1) Basic operations in Galois Field GF(2m) • addition andsubtraction (xor): x+y, x-y • multiplication: x  y • inversion: x-1 Basic operations on points of an Elliptic Curve over Galois Field GF(2m) • addition of points: P + Q • doubling a point: 2 P where P = (xP, yP), Q = (xQ, yQ) 13

  14. Basic operations of Elliptic Curve Cryptosystems (2) Complex operations on points of an Elliptic Curve over Galois Field GF(2m) • scalar multiplication: k  P = P + P + …+P k times • double scalar multiplication: k  P + l  Q 14

  15. Doubling, 2P Addition, P+Q R = 2 P R = P + Q • P = (xP, yP) • Q = (xQ, yQ) • R = (xR, yR) • xR = 2 +  + xP + xQ + a2 • yR = (xP - xR) - yP • where • = (y1 + y2)(x1 + x2)-1 • Number of field operations: • 3 multiplications • 1 inversion • P = (xP, yP) • R = (xR, yR) • x3 = a6(xP-1)2 + xP2 • y3 = xP2 + (xP + yPxP-1)xR + xR • Number of field operations: • 5 multiplications • 1 inversion a2, a6 – coefficients of a curve 15

  16. Scalar Multiplication - kP R = kP = P + P + … + P k times k = (km-1, km-2, ..., k1, k0)2 R = O S = P for ( i=0 to m-1 ) if( ki = 1 ) R = R + S end if S = 2S end for return R can be performed in parallel 16

  17. ECC Hierarchy High-level functions kP Medium-level functions 2P P+Q Low-level functions MUL INV XOR 17

  18. Investigated Partitioning Schemes 18

  19. SRC Program Partitioning C function for P P system HLL C function for MAP FPGA system VHDL macro HDL 19

  20. H00 Partitioning (μP Software Only) C function for P H kP C function for MAP 0 VHDL macro 0 20

  21. 00H Partitioning (VHDL only) C function for P 0 C function for MAP 0 VHDL macro H kP 21

  22. HML Partitioning C function for P kP H C function for MAP M 2P P+Q VHDL macro L INV XOR MUL 22

  23. 0HL Partitioning C function for P 0 kP C function for MAP H P+Q 2P VHDL macro INV XOR MUL L 23

  24. 0HM Partitioning C function for P 0 C function for MAP H kP VHDL macro M P+Q 2P 24

  25. GF(2m) Multiplier Constant P Input B • Input: • A, B  GF(2m) • Output: • C = A*B mod P • 1. C = 0 • 2. for i = m-1 to 0 do • C = C<<1 + A*bi • C = C + cm*P • 5. end for • 6. return C m m AND B <<1 0 m-1 m-1 AND C A <<1 Input A m m Result m+1 clock cycles per multiplication 25

  26. GF(2m) Inverter • Input: A  GF(2m) • Output: C = A-1 mod P • 1. Y=A, D=P, B=0, Z=1 • 2. loop • 3. while y0 = 0 do • 4. Y=Y>>1 • X=(X + z0*P)>>1 • 5. end while • 6. if (Y=1) • return Z • 8. if (D>Y) then • D<=>Y, B<=>Z • 10. Y=Y+D, Z=Z+B • 11. end loop Input A Constant P m 0 0 Swapping Swapping m B D m m 1 >>1 >>1 Z Inside Y Inside while loop while loop m m m Modified Almost Inverse Algorithm Result Time of inversion is input-dependent Typically, 3-4 times m, on average 26

  27. Unrolled Implementation Approach Using Two FPGA Devices MUL MUL MUL MUL MUL MUL MUL MUL INV INV FPGA1 FPGA2 kP I/O 2P P+Q 27

  28. Iterative Implementation Approach Using Two FPGA Devices MUL MUL MUL INV INV FPGA1 FPGA2 kP I/O 2P P+Q 28

  29. Iterative Implementation Approach Using One FPGA Device MUL MUL MUL INV INV FPGA1 FPGA2 kP I/O P+Q 2P 29

  30. Results 30

  31. Timing Measurements .c file .mc file MAP function MAP function MAP Alloc. MAP Free FPGA Configure DMA Data In FPGA Computation DMA DataOut End-to-End time (HW) End-to-End time (SW) MAP Allocation time MAP Release Time Configuration time 31

  32. Timing measurements 32

  33. Resource Utilization 33

  34. Number of lines of code 34

  35. End-to-End Latency for Different Partitioning Approaches 101,145 35

  36. FPGA Resource Usage for Different Partitioning Approaches 36

  37. Conclusions • Elliptic Curve Cryptosystem implementation • challenging for reconfigurable computers because of • optimization for latency rather than throughput • limited amount of parallelism • From 8 to 9 times speed-up over highly optimized • microprocessor implementation demonstrated • using four different algorithm partitioning schemes • 0HL iterative 2-chip • 0HL unrolled 2-chip • 0HM 2-chip • 00H 1-chip 37

  38. Conclusions – cont. Clear trade-offs: Resources Timing Ease of programming 38

  39. Conclusions – cont. Assuming focus on: Resources Timing Ease of programming 39

  40. C function for P 0 kP C function for MAP H P+Q 2P VHDL macro INV XOR MUL L Conclusions – cont. The best implementation approach: OHL partitioning scheme, 2-chip, unrolled Only 8% increase in the execution time compared to pure VHDL 40

More Related