1 / 49

Ekawat Homsirikamol & Kris Gaj George Mason University USA

Ekawat Homsirikamol & Kris Gaj George Mason University USA. Benchmarking of Cryptographic Algorithms in Hardware. Co-Author. Ekawat Homsirikamol a.k.a “ Ice ”. Working on the PhD Thesis entitled “ A New Approach to the Development of Cryptographic Standards Based on the Use of

pabla
Download Presentation

Ekawat Homsirikamol & Kris Gaj George Mason University USA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ekawat Homsirikamol & Kris GajGeorge Mason UniversityUSA Benchmarking of Cryptographic Algorithms in Hardware

  2. Co-Author Ekawat Homsirikamol a.k.a “Ice” Working on the PhD Thesis entitled “A New Approach to the Development of Cryptographic Standards Based on the Use of High-Level Synthesis Tools”

  3. Cryptographic Standard Contests IX.1997 X.2000 AES 15 block ciphers NESSIE I.2000 XII.2002 CRYPTREC XI.2004 V.2008 eSTREAM 34 stream ciphers X.2012 X.2007 SHA-3 51 hash functions 56 authenticated ciphers CAESAR 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 time

  4. Growing number of candidates Long time necessary to develop and verify RTL (Register Transfer Level) VHDL or Verilog code Multiple variants of algorithms (e.g., 3 different key sizes in the AES Contest, 4 different output sizes in the SHA-3 Contest) Multiple hardware architectures (based on folding, unrolling, pipelining, etc.) Dependence on skills of the designers Difficulties of Hardware Benchmarking

  5. Potential Solution: High-Level Synthesis (HLS) High Level Language (e.g. C, C++, Matlab, Cryptol) High-Level Synthesis Hardware Description Language (e.g., VHDL or Verilog)

  6. Generation 1 (1980s-early 1990s): research period Generation 2 (mid 1990s-early 2000s): Commercial tools from Synopsys, Cadence, Mentor Graphics, etc. Input languages: behavioral HDLs Target: ASIC Outcome: Commercial failure Generation 3 (from early 2000s): Domain oriented commercial tools: in particular for DSP Input languages: C, C++, C-like languages (Impulse C, Handel C, etc.), Matlab + Simulink, Bluespec Target: FPGA, ASIC, or both Outcome: First success stories Short History of High-Level Synthesis

  7. AutoESL Design Technologies, Inc. (25 employees) Flagship product: AutoPilot, translating C/C++/System C to VHDL or Verilog Acquired by the biggest FPGA company, Xilinx Inc., in 2011 AutoPilot integrated into the primary Xilinx toolset, Vivado, as Vivado HLS, released in 2012 “High-Level Synthesis for the Masses” Cinderella Story

  8. Ranking of candidate algorithms in cryptographic contests in terms of their performance in modern FPGAs will remain the same independently whether the HDL implementations are developed manually or generated automatically using High-Level Synthesis tools The development time will be reduced by at least an order of magnitude Our Hypothesis

  9. Early feedback for designers of cryptographic algorithms Typical design process based only on security analysis and software benchmarking Lack of immediate feedback on hardware performance Common unpleasant surprises, e.g., Mars in the AES Contest; BMW, ECHO, and SIMD in the SHA-3 Contest Potential Additional Benefits

  10. Traditional Development and Benchmarking Flow Informal Specification Test Vectors Manual Design Functional Verification HDL Code Post Place & Route Results Manual Optimization FPGA Tools Timing Verification Netlist

  11. Extended Traditional Development and Benchmarking Flow Informal Specification Test Vectors Manual Design Functional Verification HDL Code Post Place & Route Results Option Optimization ATHENa FPGA Tools Timing Verification Netlist

  12. HLS-Based Development and Benchmarking Flow Reference Implementation in C Manual Modifications (pragmas, tweaks) Test Vectors HLS-ready C code High-Level Synthesis Functional Verification HDL Code Post Place & Route Results Option Optimization ATHENa FPGA Tools Timing Verification Netlist

  13. 5 final SHA-3 candidates Most efficient sequential architectures (/2h for BLAKE, x4 for Skein, x1 for others) GMU RTL VHDL codes developed during SHA-3 contest Reference software implementations in Cincluded in the submission packages Hypotheses: Ranking of candidates will remain the same Performance ratios RTL/HLS similar across candidates Our Test Case

  14. Manual RTL vs. HLS-based Results: Altera Stratix III RTL HLS

  15. Manual RTL vs. HLS-based Results: Altera Stratix IV RTL HLS

  16. Ratios of Major Results RTL/HLS for Altera Stratix III

  17. Ratios of Major Results RTL/HLS for Altera Stratix IV

  18. Lack of Correlation for Xilinx Virtex 6 RTL HLS

  19. Datapath vs. Control Unit Data Inputs Control Inputs Control Signals Datapath Control Unit Status Signals Data Outputs Control Outputs • Determines • Area • Clock Frequency • Determines • Number of clock cycles

  20. Datapath inferred correctly Frequencyandareawithin 30% of manual designs Control Unit suboptimal Difficulty in inferring an overlap between completing the last round and reading the next input block One additional clock cycle used for initialization of the state at the beginning of each round The formulas for throughput: RTL: Throughput = Block_size / (#Rounds * TCLK) HLS: Throughput = Block_size / ((#Rounds+2) * TCLK) Encountered Problems

  21. Hypothesis I: Ranking of candidates in terms of throughput, area, and throughput/area ratiowill remain the same TRUE for Altera Stratix III and Stratix IV FALSE for Xilinx Virtex 5 and Virtex 6 Hypothesis II: Performance ratios RTL/HLS similar across candidates Hypothesis Check

  22. Correlation Between Altera FPGA Results and ASICs Stratix III FPGA ASIC

  23. Proposed Interface for Authenticated Ciphers clk rst rst clk Cipher Core w w pdi do DO Data Output Ports PDI Public Data Input Ports do_ready pdi_ready pdi_read do_write w sdi error SDI Secret Data Input Ports Error Notification Ports sdi_ready 8 ecode sdi_read

  24. Typical External Circuit clk rst rst clk rst rst clk clk Cipher Core w w w w ido edo epdi ipdi pdi do DO FIFO PDI FIFO ofifo_full ofifo_empty pfifo_empty pfifo_full do_ready pdi_ready ofifo_write ofifo_read pfifoin_read pfifo_write pdi_read do_write w w esdi isdi sdi error SDI FIFO sfifo_empty sfifo_full sdi_ready 8 ecode sfifo_read sfifo_write sdi_read clk rst

  25. Format of Secret Data Input w bits . . . instruction seg_0_header seg_0 = Key

  26. Format of Public Data Input: Encryption w bits instruction . . . seg_0_header seg_0 = IV seg_1_header seg_1 = AD seg_2_header seg_2 = Message

  27. Format of Segment Header w-1 0 – – 1 w-16 8 4 2 1 1 LS Input ID [0..255] Segment Length [0..2w-16-1 bytes] Segment Type 0000 – Reserved 0001 – Initialization Vector 0010 – Associated Data 0011 – Message 0100 – Ciphertext 0101 – Tag 0110 – Key LS = 1 if the last segment of input 0 otherwise

  28. Manual RTL Designs Following Proposed Interfaceon Altera Stratix IV

  29. Already available at http://cryptography.gmu.edu/athena Similar to the database of results for hash functions, filled with ~1600 results during the SHA-3 contest Results can be entered by designers themselves.If you would like to do that, please contact me regarding an account. The ATHENa Option Optimization Tool supports automaticgeneration of results suitable for uploading to the database ATHENa Database of Results for Authenticated Ciphers

  30. Ordered Listing with a Single-Best (Unique) Result per Each Algorithm

  31. 30 Round 1 CASER candidates to be implemented manually in VHDL as a part of the graduate class taught at GMU in Fall 2014. One cipher per student. One PhD student, Ice, will implement the same 30 ciphers in parallel using HLS. Preliminary results in mid-December 2014, about a month before the announcement of Round 2 candidates. Deadline for second-round Verilog/VHDL: April 15, 2014. Implementation of CAESAR Round 1 Candidates

  32. Our Team would be happy to work closely with the designer teams About 50 candidates remaining vs. 30 students working on VHDL designs this Fall If you would like your candidate cipher to be implemented in VHDL, please do not hesitate to contact me ASAP. Support for CAESAR Teams

  33. High-level synthesis offers a potential to allow hardware benchmarking during the design of cryptographic algorithms and in early stages of cryptographic contests Case study based on 5 final SHA-3 candidates demonstrated correct ranking for Altera FPGAs for all major performance measures More research needed to overcome remaining difficulties, such as Limited correlation with manual RTL designs for Xilinx FPGAs Suboptimal control unit. Conclusions

  34. Most Promising Methodology & Toolset Reference Implementation in C Manual Modifications HLS-ready C code Frequency & Throughput decrease Areaincreases by no more than 30% compared to manual RTL High-Level Synthesis Xilinx Vivado HLS HDL Code Option Optimization GMU ATHENa FPGA Tools Altera Quartus II Results

  35. Expected by the end of 2014 20-30 RTL results generated by 20-30 GMU students 30 HLS results generated by “Ice”alone

  36. Thank you! Questions? Suggestions? ATHENa: http:/cryptography.gmu.edu/athena CERG: http://cryptography.gmu.edu

  37. Back-up Slides

  38. Example of Source Code Modifications for (i = 0; i < 4; i ++) #pragma HLS UNROLL for (j = 0; j < 4; j ++) #pragma HLS UNROLL b[i][j] = s[i][j];

  39. Example of Source Code Modifications void AES_encrypt (word8 a[4][4], word8 k[4][4], word8 b[4][4]) { #pragma HLS ARRAY_RESHAPE variable=a[0] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a[1] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a[2] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a[3] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a complete dim =1 reshape

  40. Example of Source Code Modifications Word32 Rcon[10] = { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36}; #pragma HLS RESOURCE variable=Rcon0 core=ROM_1P_1S

  41. Register Transfer Level (RTL) Design Description Combinational Logic Combinational Logic Registers

  42. Results for AES

  43. C/C++ vs. Cryptol

  44. Potential for formal verification Logic equivalence check: HLL code vs. low-level hardware description (netlist) Unfortunately, no such support in the current generation of Vivado HLS Potential Additional Benefits

  45. Manual RTL Designs Following Proposed Interfaceon Xilinx Virtex 6

  46. Manual RTL Designs Following Proposed Interfaceon Xilinx Spartan 6

More Related