1 / 39

Modern Benchmarking: Natural Progression of Tools

Comprehensive environment for benchmarking using FPGAs: ATHENa - Automated Tool for Hardware EvaluatioN. Modern Benchmarking: Natural Progression of Tools. Software. FPGAs. ASICs. eBACS. ?. ?. D. Bernstein, T. Lange. http://cryptography.gmu.edu/athena.

bolanderl
Download Presentation

Modern Benchmarking: Natural Progression of Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comprehensive environment for benchmarking using FPGAs: ATHENa - Automated Tool for Hardware EvaluatioN

  2. Modern Benchmarking: Natural Progression of Tools Software FPGAs ASICs eBACS ? ? D. Bernstein, T. Lange

  3. http://cryptography.gmu.edu/athena ATHENa – Automated Tool for Hardware EvaluatioN • Set of scripts written in Perl aimed at an AUTOMATED generation of • OPTIMIZED results for • MULTIPLE hardware platforms Currently under development at George Mason University. Version 0.3.1

  4. Why Athena? "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. 
Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.” from "Athena, Greek Goddess of Wisdom and Craftsmanship"

  5. Designers of ATHENa Xin PhD ECEstudent Venkata “Vinny” MS CpEstudent Ekawat “Ice” MS CpEstudent Marcin PhD ECEstudent Michal PhD exchange student from Slovakia Rajesh PhD ECEstudent

  6. Basic Dataflow of ATHENa User FPGA Synthesis and Implementation 6 5 3 Ranking of designs 2 Database query HDL + scripts + configuration files Result Summary + Database Entries ATHENa Server 1 HDL + FPGA Tools Download scripts andconfiguration files8 4 Designer Database Entries Interfaces+ Testbenches 6 0

  7. testbench • configuration files • constraint files • synthesizable source files database entries (machine- friendly) result summary (user-friendly)

  8. synthesizable source files • configuration files result summary (user-friendly)

  9. ATHENa Major Features (1) • synthesis, implementation, and timing analysis in the batch mode • support for devices and tools of multiple FPGA vendors: • generation of results for multiple families of FPGAs of a given vendor • automated choice of a best-matching device within a given family

  10. ATHENa Major Features (2) • automated verification of the design through simulation in the batch mode • exhaustive search for optimum options of tools • heuristic adaptive optimization strategies aimed at maximizing selected performance measures (e.g., speed, area, speed/area ratio, power, cost, etc.) OR

  11. ATHENa Major Features (2) • automated verification of the design through simulation in the batch mode • exhaustive search for optimum options of tools • heuristic adaptive optimization strategies aimed at maximizing selected performance measures (e.g., speed, area, speed/area ratio, power, cost, etc.) OR

  12. Multi-Pass Place-and-Route Analysis GMU SHA-512, Xilinx Virtex 5 100 runs for different placement starting points ~ 20% The smaller the better worst best Minimum clock 12

  13. Dependence of Results on Requested Clock Frequency

  14. ATHENa Applications • single_run: • - one set of options • placement_search • - one set of options • - multiple starting points for placement • exhaustive_search • - multiple sets of options • - multiple starting points for placement • - multiple requested clock frequencies

  15. Throughput [Mbit/s] SHA-1 Results Virtex 5 Virtex 4 Spartan 3 Architectures

  16. ATHENA Results for SHA-1, SHA-256 & SHA-512

  17. Ideas (1) • Select several representative FPGA platforms with significantly different properties • e.g., different vendor – Xilinx vs. Altera • process - 90 nm vs. 65 nm • LUT size - 4-input vs. 6-input • optimization - low-cost vs. high-performance • Use ATHENa to characterize all SHA-3 candidates • and SHA-2 using these platforms in terms • of the target performance metrics • (e.g. throughput/area ratio)

  18. Ideas (2) • Calculate ratio • SHA-3 candidate performance vs. • SHA-2 performance (for the same security level) • Calculate geometrical average over multiple • platforms

  19. Xilinx FPGA Devices

  20. Xilinx FPGA Device Support by Tools

  21. Altera FPGA Devices

  22. Altera FPGA Device Support by Tools

  23. FPGA and ASIC Performance Measures

  24. The common ground is vague • Hardware Performance: cycles per block, cycles per byte, Latency (cycles), Latency (ns), Throughput for long messages, Throughput for short messages, Throughput at 100 KHz, Clock Frequency, Clock Period, Critical Path Delay, Modexp/s, PointMul/s • Hardware Cost: Slices, Slices Occupied, LUTs, 4-input LUTs, 6-input LUTs, FFs, Gate Equivalent GE, Size on ASIC, DSP Blocks, BRAMS, Number of Cores, CLB, MUL, XOR, NOT, AND • Hardware efficiency: Hardware performance/Hardware cost

  25. Our Favorite Hardware Performance Metrics: Mbit/s for Throughput ns for Latency Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries)

  26. But how to define and measure throughput and latency for hash functions? Time to hash N blocks of message = Htime(N, TCLK) = Initialization Time(TCLK) + N * Block Processing Time(TCLK) + Finalization Time(TCLK) Latency = Time to hash ONE block of message = Htime(1, TCLK) = = Initialization Time + Block Processing Time + Finalization Time Block size Throughput (for long messages) = Htime(N+1, TCLK) - Htime(N, TCLK) Block size = Block Processing Time (TCLK)

  27. But how to define and measure throughput and latency for hash functions? Initialization Time(TCLK) = cyclesI ⋅ TCLK Block Processing Time(TCLK) = cyclesP ⋅ TCLK Finalization Time(TCLK) = cyclesF ⋅ TCLK Block size from place & route report (or experiment) from specification from analysis of block diagram and/or functional simulation

  28. How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) Performance for long message = 2048

  29. How to compare hardware speed vs. software speed? 8 bits/byte ⋅ clock frequency [GHz] Throughput [Gbit/s] = Performance for long message [cycles/byte]

  30. How to measure hardware cost in FPGAs? 1. Stand-alone cryptographic core on FPGA Cost of a smallest FPGA that can fit the core. Unit: USD [FPGA vendors would need to publish MSRP (manufacturer’s suggested retail price) of their chips] – not very likely or size of the chip in mm2- easy to obtain 2. Part of an FPGA System On-Chip Vector: (CLB slices, BRAMs, MULs, DSP units) for Xilinx (LEs, memory bits, PLLs, MULs, DSP units) for Altera 3. FPGA prototype of an ASIC implementation Force the implementation using only reconfigurable logic (no DSPs or multipliers, distributed memory vs. BRAM): Use CLB slices as a metric. [LEs for Altera]

  31. How to measure hardware cost in ASICs? 1. Stand-alone cryptographic core Cost = f(die area, pin count) Tables/formulas available from semiconductor foundries 2. Part of an ASIC System On-Chip Cost ~ circuit area Units: μm2 or GE (gate equivalent) = size of a NAND2 cell

  32. Deliverables (1) 1. Detailed block diagram of the Datapath with names of all signals matching VHDL code [electronic version a bonus] 2. Interface with the division into the Datapath and the Controller [electronic version] • ASM charts of the Controller, and a block diagram of connections among FSMs (if more than one used) [electronic version a bonus] 4. RTL VHDL code of the Datapath, the Controller, and the Top-Level Circuit 5. Updated timing and area analysis formulas for timing confirmed through simulation

  33. Deliverables (2) 6. Report on verification − highest level entity verified for functional correctness • Functional simulation • Post-synthesis simulation • Timing simulation [bonus] − verification of lower-level entities • Name of entity • Testbench used for verification • Result of verification, incorrect behavior, possible source of error

  34. Deliverables (3) 7. Results of benchmarking using ATHENa • Entire core or the highest level entity verified for correct functionality • Xilinx Spartan 3, Virtex 4, Virtex 5 • Three methods of testing • Single_run • Placement_search [cost table = 1, 11, 21] • Exhaustive_search [cost_table = 31, 41, 51; speed or area; two sets of requested frequencies] • Results generated by ATHENa • Your own graphs and charts • Observations and conclusions

  35. Bonus Deliverables (4) 8. Pseudo-code [but not a C code] 9. Bugs and suspicious behavior of ATHENa 10. Additional results of benchmarking using ATHENa • Altera Cyclone II, Stratix II, Cyclone III, Arria I, Stratix III • Three methods of testing • Single_run • Placement_search [seed = 1, 1000, 2000] • Exhaustive_search [seed = 3000, 4000, 5000; speed or area; two sets of requested frequencies] • Results generated by ATHENa • Your own graphs and charts • Observations and conclusions

  36. Bonus Deliverables (5) 11. Report from the meeting with students working on the same SHA core • Summary of major differences • Advantages and disadvantages of your design 12. Bugs found in the • Padding script • Testbench • Class examples • Slides • Documentation • SHA-3 Packages • Etc.

  37. Bonus Deliverables (6) 13. Extending the design to cover all hash function variants • Hash value sizes: 512 [highest priority], 384, 224 • Other variant/parameter support specific to a given hash function • Support through generics or constants 14. Padding in hardware Assuming that message size before padding is already a multiple of the • word size • byte size • a single bit

  38. Composition of Students 4 GWU PhD candidates 14 international students 14 local students (with 3 former BSCpE graduates)

  39. After Grading • Summary of results published on the course web page • Selected students invited to develop articles/reports to be posted on the - ATHENa web page - SHA-3 Zoo Web Page • Unification, generalization and optimization of codes by Ice, myself, and other students • Presentation to NIST, conference submissions, presentation at the Second SHA-3 Conference in Santa Barbara in August 2010.

More Related