1 / 16

High Throughput AES

High Throughput AES. Alireza Hodjat IVGroup. k n. Key Addition. Key Sch_Sub. Substitution. Key Sch_rt. Shift Row. Key Sch_xor. Mix Column. k i. Key Addition. Key Sch_Sub. Key Sch_rt. Substitution. Key Sch_xor. Shift Row. Key Addition. The AES Algorithm. Outer-round Pipelining.

Download Presentation

High Throughput AES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Throughput AES Alireza Hodjat IVGroup

  2. kn Key Addition Key Sch_Sub Substitution Key Sch_rt Shift Row Key Sch_xor Mix Column ki Key Addition Key Sch_Sub Key Sch_rt Substitution Key Sch_xor Shift Row Key Addition The AES Algorithm

  3. Outer-round Pipelining

  4. Inner- and Outer-round Pipelining

  5. The Highest Possible Throughput • The choice of 128-bit key only • Completely unrolled loop • Pipelined • Between each round (Outer-round) • Inside each round (Inner-round) • This causes huge area consumption.

  6. Area Optimization • Area optimization inside each round • Two different techniques: • Resource sharing • Re-timing • Break the critical path and perform the algorithm in multiple clock cycles • Critical path: Substitution • Area-delay trade-off

  7. Sbox area-delay trade-off for FPGA Sbox area-delay trade-off for ASIC Design Type Design Type Critical path Critical path Area Area Re-timing Re-timing Direct No-Pipeline Direct No-Pipeline 4.05 ns 1.19 ns 2.086 Kgates 136 LUTs No No Indirect No-Pipeline Indirect No-Pipeline 10.41 ns 3.67 ns 1.167 Kgates 94 LUTs No No Direct One stage pipeline Direct One stage pipeline 3.91 ns 0.78 ns 3.51 Kgates 136 LUTs Yes 2 pipe stages Yes 2 pipe stages Indirect Three stage pipeline Indirect Three stage pipeline 5.95 ns 1.11 ns 1.65 Kgates 90 LUTs Yes 3 pipe stages Yes 3 pipe stages Direct No-pipeline Using Block RAM 4.87 ns 0 LUTs No Sbox Area-Delay Trade-off • Direct Implementation: Look-up table • Indirect Implementation: GF(24) • Wolkerstorfer Design • Patrick’s codes

  8. 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 S S S S S S S S S S S S S S S S 4 3 2 1 M M M M 4 3 2 1 + + + + + + + + + + + + + + + + AES Encrypt Datapath

  9. 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 S S S S + + + + + + + + + + + + + + + + + Key Scheduling Datapath

  10. 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 S S S S S S S S S S S S S S S S 1 Cycle 1 Cycle 1 Cycle M M M M 1 Cycle + + + + + + + + + + + + + + + + Design 1: Straight Forward 1 Round

  11. 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S 1 Cycle 1 Cycle 1 Cycle M M M M + + + + + + + + + + + + + + + + 1 Cycle Design 2: Use re-timing for Sbox 1 Round

  12. 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 S-A S-B S-D S-C 4 Cycle 4 Cycle M + + + + 4 Cycle Design 3: Use resource sharing 1 Round

  13. 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 S-A-1 S-C-1 S-B-1 S-D-1 S-C-2 S-B-2 S-A-2 S-D-2 M + + + + Design 4: Use resource sharing and re-timing for Sbox 5 Cycle 1 Round 5 Cycle 5 Cycle

  14. 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 S-D-1 S-C-1 S-A-1 S-B-1 1 Cycle S-D-2 S-A-2 S-C-2 S-B-2 1 Cycle Mix Column 1 Cycle + + + + 1 Cycle Design 5: Resource sharing and pipelining and re-timing for Sbox 1 Round

  15. S1 S2 M A S1 S2 M A 1 2 1 3 2 1 Time 4 3 2 1 1 4 3 2 2 1 4 3 3 2 1 4 4 3 2 1 1 1 4 3 2 2 1 2 1 4 3 3 2 1 3 2 1 4 4 3 2 1 4 3 2 1 1 4 3 2 1 … Round 1 Round 2 Inner-Round Pipeline for Design 5

  16. Design # 1 # 2 # 3 # 4 # 5 Clock per Sample 1 1 4 5 4 Pipe stages per round 4 stages 4 stages 3 stages 4 stages 4 stages Total pipe stages 4  10 stages 4  10 stages 3  10 stages 4  10 stages 4  10 stages Latency 4  10 cycles 4  10 cycles 4  3  10 cycles 5  3  10 cycles (4  10) + 4 cycles FPGA Throughput (200MHz) 25.6 Gbit/s 25.6 Gbit/s 6.4 Gbit/s 6.4 Gbit/s 6.4 Gbit/s ASIC Critical path 1.5 ns 650 MHz 1 ns 1 GHz 1.5 ns 650 MHz 1 ns 1 GHz 1 ns 1 GHz Estimated Area Less than 500 Kgates Less than 900 Kgates Less than 150 Kgates Less than 300 Kgates Less than 250 Kgates ASIC Throughput (128*650) 83.2 Gbit/s (128*1) 128 Gbit/s (128*650/4) 20.8 Gbit/s (128*1/5) 25.6 Gbit/s (128*1/4) 32 Gbit/s Performance Estimation

More Related