1 / 63

Kris Gaj

Kris Gaj. Research and teaching interests: cryptography computer arithmetic VLSI design and testing Contact: Engineering Bldg., room 3225 kgaj@gmu.edu (703) 993-1575. Office hours: Monday, 7:30-8:30 PM Tuesday, 6:00-7:00 PM,

Download Presentation

Kris Gaj

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kris Gaj • Research and teaching interests: • cryptography • computer arithmetic • VLSI design and testing • Contact: • Engineering Bldg., room 3225 • kgaj@gmu.edu • (703) 993-1575 Office hours:Monday, 7:30-8:30 PM Tuesday, 6:00-7:00 PM, and by appointment

  2. ECE 645 Part of: MS in CpE Digital Systems Design– pre-approved course Other concentration areas – elective course MS in EE Certificate in VLSI Design/Manufacturing PhD in ECE PhD in IT

  3. DIGITAL SYSTEMS DESIGN • ECE 545 Digital System Design with VHDL– K. Gaj, project, FPGA design with VHDL, Aldec/Synplicity/Xilinx/Altera • 2. ECE 645 Computer Arithmetic– K. Gaj, project, FPGA design with VHDL or Verilog, • Aldec/Synplicity/Xilinx/Altera • 3. ECE 586 Digital Integrated Circuits – D. Ioannou • 4. ECE 681 VLSI Design for ASICs– N. Klimavicz, project/lab, front-end and back-end ASIC design with Synopsys tools • 5. ECE 682 VLSI Test Concepts– T. Storey, homework

  4. Prerequisites ECE 545 Digital System Design with VHDL or Permission of the instructor, granted assuming that you know VHDL orVerilog, High level programming language (preferably C)

  5. Prerequisite knowledge This class assumes proficiency with the FPGA CAD tools from ECE 545 You are expected to be proficient with: Synthesizable VHDL coding Advanced VHDL testbenches, including file input/output Xilinx FPGA synthesis and post-synthesis simulation Xilinx FPGA place-and-route and post-place and route simulation Reading and interpreting all synthesis and implementation reports

  6. Course web page ECE web page  Courses  Course web pages  ECE 645 http://ece.gmu.edu/coursewebpages/ECE/ECE645/S10/

  7. Computer Arithmetic Lecture Project Homework 10 % Midterm exam (in class) 15 % Final Exam (in class) 25 % Project 1 20 % Project 2 30 %

  8. Advanced digital circuit design course covering Efficient • addition and subtraction • multiplication • division and modular reduction • exponentiation • Elements • of the Galois • field GF(2n) • polynomial base Integers unsigned and signed Real numbers • fixed point • single and double precision • floating point

  9. Course Objectives • At the end of this course you should be able to: • Understand mathematical and gate-level algorithms for computer • addition, subtraction, multiplication, division, and exponentiation • Understand tradeoffs involved with different arithmetic • architectures between performance, area, latency, scalability, etc. • Synthesize and implement computer arithmetic blocks on FPGAs • Be comfortable with different number systems, and have familiarity • with floating-point and Galois field arithmetic for future study • Understand sources of error in computer arithmetic and basics • of error analysis • This knowledge will come about through homework, projects • and practice exams.

  10. Lecture topics (1) INTRODUCTION 1. Applications of computer arithmetic algorithms 2. Number representation • Unsigned Integers • Signed Integers • Fixed-point real numbers • Floating-point real numbers • Elements of the Galois Field GF(2n)

  11. ADDITION AND SUBTRACTION 1. Basic addition, subtraction, and counting 2. Carry-lookahead, carry-select, and hybrid adders 3. Adders based on Parallel Prefix Networks

  12. MULTIOPERAND ADDITION 1. Carry-save adders 2. Wallace and Dadda Trees 3. Adding multiple unsigned and signed numbers

  13. TECHNOLOGY 1. Internal Structure of Xilinx and Altera FPGAs 2. ASIC standard cell libraries and synthesis tools for ASICs 3. Two-operand and multi-operand addition in FPGAs

  14. MULTIPLICATION 1. Tree and array multipliers 2. Sequential multipliers 3. Multiplication of signed numbers and squaring

  15. TECHNOLOGY 1. Pipelining 2. Multi-cycle paths 3. Multiplication in Xilinx and Altera FPGAs - using distributed logic - using embedded multipliers - using DSP blocks

  16. LONG INTEGER ARITHMETIC • Modular Exponentiation • Montgomery Multipliers and Exponentiation Units

  17. DIVISION • Basic restoring and non-restoring • sequential dividers • 2. SRTand high-radix dividers • 3. Array dividers

  18. FLOATING POINT AND GALOIS FIELD ARITHMETIC • Floating-point units • 2. Galois Field GF(2n) units

  19. Literature (1) Required textbook: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, 2nd edition, Oxford University Press, 2010.

  20. Literature (2) Recommended textbooks: Jean-Pierre Deschamps, Gery Jean Antoine Bioul, Gustavo D. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems, Wiley-Interscience, 2006. Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, 2004. Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, 2002.

  21. Literature (2) VHDL books: • Pong P. Chu, RTL Hardware Design Using VHDL: • Coding for Efficiency, Portability, and Scalability, • Wiley-IEEE Press, 2006. • 2. Volnei A. Pedroni, Circuit Design with VHDL, • The MIT Press, 2004. • 3. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, • S & G Publishing, 1998.

  22. Literature (3) Supplementary books: • E. E. Swartzlander, Jr., Computer Arithmetic, • vols. I and II, IEEE Computer Society Press, 1990. • 2. Alfred J. Menezes, Paul C. van Oorschot, • and Scott A. Vanstone, • Handbook of Applied Cryptology, • Chapter 14, Efficient Implementation, • CRC Press, Inc.,1998.

  23. Literature (3) Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic: 8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94, 7/00, 3/05. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing

  24. Homework • reading assignments • design of small hardware units using VHDL • analysis of computer arithmetic algorithms • and implementations

  25. Midterm exams Midterm Exam - 2 hrs 30 minutes, in class multiple choice + short problems Final Exam – 2 hrs 45 minutes comprehensive conceptual questions, analysis and design of arithmetic units Practice exams on the web Tentative days of exams: Midterm Exam - Monday, March 23 Final Exam - Tuesday, May 11, 7:30-10:15 PM

  26. Project (1) Project I (individual, 20% of grade) Comprehensive analysis of basic operations of SHA-3 candidates • Optimization criteria: • minimum latency • minimum area • minimum product latency · area • use of embedded FPGA resources • (BRAMs, embedded multipliers, • DSP units, Different for all students Done individually Final report due Tuesday, March 16

  27. Limitations of the Current Approach • Time and effort • Accuracy of comparison One designer = too longtime to implement all candidates Multiple designers = significant inaccuracies associated with different skills and coding styles

  28. Problem How to predict ranking and relative performance of candidate algorithms without the actual time-consuming hardware implementation at the Register Transfer Level (RTL)? Applications: • Ranking of candidate algorithms submitted to the contests (large number of candidates, time limit) • Ranking of candidate algorithms during the design process by designers themselves (no experience in hardware design, short response time needed)

  29. Features of our Problem to Exploit • No need to obtain the functioning netlist or HDL description (performance numbers sufficient) • Limited accuracy required (less than 20% differences in performance considered insignificant) • Limited number of basic operations • Limited number of architectures used in practice

  30. The proposed approach

  31. Steps of Our Methodology (1) • Determine the minimum set of basic operations required to implement a given class of cryptographic transformations • Determine the required range of parameters of these operations (e.g., operand sizes in arithmetic operations) • Implement basic operations in RTL VHDL (or Verilog) in a parametric fashion (using constants and generics) 4. Characterize all operations, for all required parameter values using Xilinx and/or Altera development environments • Area and latency • Low cost FPGAs and high-performance FPGAs

  32. Major operations of AES finalists Serpent RC6 Twofish Rijndael Mars S-boxes Multiplication in GF(2m) Variable rotation Integer multiplication

  33. Auxiliary operations of AES finalists Serpent RC6 Twofish Rijndael Mars Boolean Fixed rotation Addition/ subtraction Permutation

  34. Major cipher operations (1) - S-box Software Hardware ROM C S-box n x m n-bit address WORD S[1<<n]= { 0x23, 0x34, 0x56 . . . . . . . . . . . . . . } n 2n  m bits 2n words S ASM m-bit output m S DW 23H, 34H, 56H ….. direct logic y1 x1 y2 x2 ... ... ym xn

  35. Major cipher operations (2) – Variable Rotation Software Hardware Mux-based rotation A<<<0 A<<<16 C C = (A << B) | (A >> (32-B)); B[4] B[3] B[2] B[1] A <<< B B[0] 32 ASM A<<<B variable rotation ROL32 High-speed clock ROL A, B A fast clock CLK’ min (B, 32-B) CLK’ cycles

  36. Auxiliary cipher operations (1) - Permutation Hardware Software xn C x2 x3 xn-1 x1 n . . . complex sequence of instructions <<, |, & P . . . n y1 yn y2 y3 yn-1 ASM Permutation complex sequence of instructions ROL, OR, AND order of wires

  37. Auxiliary cipher operations (4) Addition/subtraction n n n Software Hardware B A n C unsigned long A, B, C; C = A+B; ADD ASM n n C ADD Adder/subtractor C=A+B mod 2n n=32, 16

  38. Multiple designs for hardware adders Delay Ripple carry adder (RC) Carry-Skip adder (CS) Carry-LookAhead adder (CLA) Carry-Select adder Parallel-Prefix Network adder (Kogge-Stone, Brent-Kung) Area

  39. Basic operations Delay and area inHARDWARE Delay modular multiplication modular inverse addition (RC) variable rotation GF(2n) multiplication S-box 9x32 addition (CLA) Boolean S-box 8x8 S-box 4x4 permutation Area fixed rotation

  40. Basic operations Delay and area inSOFTWARE Delay modular inverse permutation GF(2n) multiplication variable rotation fixed rotation S-box 8x8 S-box 9x32 S-box 4x4 multiplication addition Memory Boolean

  41. Steps of Our Methodology (2) • Develop a simple and human-friendly notation to describe cryptographic algorithms (or their repetitive parts [rounds]), which reveals the parallelism present in the algorithm • Graphical representation more human friendly • Textual representation easier to process by computer programs • Possible Approach: • start from a textual description • adopt one of the existing graphical editors

  42. Steps of Our Methodology (2) • Develop a tool capable of estimating algorithm performance in terms of area and throughput using • High-level description • Library of basic components • Choice of architecture • Optimization criteria (minimum area, maximum throughput, maximum throughput to area ratio, etc.) • Other constraints, such as required clock frequency, etc. • Calibration of the developed tools using existing RTL designs for a limited subset of the algorithms

  43. Possible Problems • Routing (interconnect) delays • Optimizations on the boundary between two operations • Combining multiple operations into one (e.g., using look-up table approach) • Inter-round optimizations • Resource sharing techniques, in particular resource sharing between encryption and decryption circuits • Dependence of results on selected FPGA devices • Others…

  44. Summary Main project goals: • Provide cryptographic community and in particular standardization organizations/groups with a reliable and fast way of comparing large number of candidates for a cryptographic standard • Save designers of cryptographic algorithms from design blunders (such as that of IBM team in case of MARS) • Project in progress… • Feedback and collaboration is very welcome

  45. MARS –IBM team Delay and area inSOFTWARE Delay modular inverse permutation GF(2n) multiplication variable rotation fixed rotation S-box 8x8 S-box 9x32 S-box 4x4 multiplication addition Memory Boolean

  46. MARS –IBM team Delay and area inHARDWARE Delay modular multiplication modular inverse addition (RC) variable rotation GF(2n) multiplication S-box 9x32 addition (CLA) Boolean S-box 8x8 S-box 4x4 permutation Area fixed rotation

  47. Project (2) Project II (30% of grade) New Design in the area of Public Key Cryptography, Cryptanalysis, Digital Signal Processing, etc. • Real life application • Requirements derived from the analysis of an application • Software implementation (typically public domain) • used as a source of test vectors and to determine • HW/SW speed ratio • Several project topics proposed on the web • You can suggest project topic by yourself

  48. Project II (rules) • Can be done in a group of 1-3 students • Every team works on a slightly different problem • Project topics should be more complex for larger teams • Cooperation (but not exchange of codes) • between teams is encouraged Oral presentation and written report: Tuesday, May 4

  49. Degrees of freedom and possible trade-offs speed area ECE 645 power testability ECE 682 ECE 586, 681

More Related