1 / 19

SHA-3 Candidate Evaluation

SHA-3 Candidate Evaluation. FPGA Benchmarking - Phase 1. 14 Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design methodology (the same function implemented independently by 2-3 students) Uniform Input/Output Interface Uniform Generic Testbench

gafna
Download Presentation

SHA-3 Candidate Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHA-3 CandidateEvaluation

  2. FPGA Benchmarking - Phase 1 • 14 Round-2 SHA-3 Candidates implemented by • 33 graduate students following the same design • methodology (the same function implemented independently by 2-3 students) • Uniform Input/Output Interface • Uniform Generic Testbench • Optimization for maximum throughput to cost ratio • Benchmarking on multiple FPGA platforms from Xilinx and Altera using ATHENa • Comparing vs. optimized implementations of • SHA-1 & SHA-2 • Compressing all results into one single ranking

  3. Division into Datapath and Controller Data Inputs Control & Status Inputs Control Signals Datapath (Execution Unit) Controller (Control Unit) Status Signals Data Outputs Control & Status Outputs

  4. Design Methodology Specification Interface Execution Unit Control Unit Algorithmic State Machine Block diagram VHDL code VHDL code

  5. Steps of the Design Process (1) Given • Specification • Interface Completed • Pseudocode • Detailed block diagram of the Datapath • Interface with the division into the Datapath and the Controller • Timing and area analysis, architectural-level optimizations • RTL VHDL code of the Datapath, and corresponding Testbenches

  6. Steps of the Design Process (2) Remained to be done • ASM chart of the Controller • RTL VHDL Code the Controller and the corresponding testbench • Integration of the Datapath and the Controller • Testing using uniform generic testbench (developed by Ice) • Source Code Optimizations • Performance characterization using ATHENa • Documentation and final report

  7. FPGA Benchmarking - Phase 2 • extending source codes to cover all hash functionvariants • padding in hardware • applying additional architectural optimizations • extended benchmarking (Actel FPGAs, multiple tools, • adaptive optimization strategies, etc.) • reconciling differences with other available rankings • preparing the codes for ASIC evaluation

  8. How to compress all results into a single ranking?

  9. Single Ranking (1) • Select several representative FPGA platforms with significantly different properties • e.g., different vendor – Xilinx vs. Altera • process - 90 nm vs. 65 nm • LUT size - 4-input vs. 6-input • optimization - low-cost vs. high-performance • Use ATHENa to characterize all SHA-3 candidates • and SHA-2 using these platforms in terms • of the target performance metrics • (e.g. throughput/area ratio)

  10. Single Ranking (2) • Calculate ratio • SHA-3 candidate performance vs. • SHA-2 performance (for the same security level) • Calculate geometrical average over multiple • platforms

  11. FPGA and ASIC Performance Measures

  12. The common ground is vague • Hardware Performance: cycles per block, cycles per byte, Latency (cycles), Latency (ns), Throughput for long messages, Throughput for short messages, Throughput at 100 KHz, Clock Frequency, Clock Period, Critical Path Delay, Modexp/s, PointMul/s • Hardware Cost: Slices, Slices Occupied, LUTs, 4-input LUTs, 6-input LUTs, FFs, Gate Equivalent GE, Size on ASIC, DSP Blocks, BRAMS, Number of Cores, CLB, MUL, XOR, NOT, AND • Hardware efficiency: Hardware performance/Hardware cost

  13. Our Favorite Hardware Performance Metrics: Mbit/s for Throughput ns for Latency Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries)

  14. But how to define and measure throughput and latency for hash functions? Time to hash N blocks of message = Htime(N, TCLK) = Initialization Time(TCLK) + N * Block Processing Time(TCLK) + Finalization Time(TCLK) Latency = Time to hash ONE block of message = Htime(1, TCLK) = = Initialization Time + Block Processing Time + Finalization Time Block size Throughput (for long messages) = Htime(N+1, TCLK) - Htime(N, TCLK) Block size = Block Processing Time (TCLK)

  15. But how to define and measure throughput and latency for hash functions? Initialization Time(TCLK) = cyclesI ⋅ TCLK Block Processing Time(TCLK) = cyclesP ⋅ TCLK Finalization Time(TCLK) = cyclesF ⋅ TCLK Block size from place & route report (or experiment) from specification from analysis of block diagram and/or functional simulation

  16. How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) Performance for long message = 2048

  17. How to compare hardware speed vs. software speed? 8 bits/byte ⋅ clock frequency [GHz] Throughput [Gbit/s] = Performance for long message [cycles/byte]

  18. How to measure hardware cost in FPGAs? 1. Stand-alone cryptographic core on FPGA Cost of a smallest FPGA that can fit the core. Unit: USD [FPGA vendors would need to publish MSRP (manufacturer’s suggested retail price) of their chips] – not very likely or size of the chip in mm2- easy to obtain 2. Part of an FPGA System On-Chip Vector: (CLB slices, BRAMs, MULs, DSP units) for Xilinx (LEs, memory bits, PLLs, MULs, DSP units) for Altera 3. FPGA prototype of an ASIC implementation Force the implementation using only reconfigurable logic (no DSPs or multipliers, distributed memory vs. BRAM): Use CLB slices as a metric. [LEs for Altera]

  19. How to measure hardware cost in ASICs? 1. Stand-alone cryptographic core Cost = f(die area, pin count) Tables/formulas available from semiconductor foundries 2. Part of an ASIC System On-Chip Cost ~ circuit area Units: μm2 or GE (gate equivalent) = size of a NAND2 cell

More Related