1 / 26

Design Exploration of 192-bit Elliptic Curve Adder on StarBridge HC-36 System

Design Exploration of 192-bit Elliptic Curve Adder on StarBridge HC-36 System. Gang Quan, Duncan A. Buell, James P. Davis, Siddaveerasharan Devarkal. Elliptic Curve Cryptography. Emerging as new generation of cryptosystems based on public key cryptography

Download Presentation

Design Exploration of 192-bit Elliptic Curve Adder on StarBridge HC-36 System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Exploration of 192-bit Elliptic Curve Adder on StarBridge HC-36 System Gang Quan, Duncan A. Buell, James P. Davis, SiddaveerasharanDevarkal

  2. Elliptic Curve Cryptography • Emerging as new generation of cryptosystems based on public key cryptography • No sub-exponential algorithm to solve the discrete logarithm problem • Smallest key size & highest strength per bit compared to other public key cryptosystems • Smaller key sizes suitable for hardware implementation P60

  3. NIST standards • NIST has proposed a specific set of elliptic curves for cryptography purposes • Elliptic curves are defined for prime fields GF(p) and binary polynomial fields GF(2m) • Prime fields for 192, 224, 256, 384 and 521 bits • Binary fields for 163, 233, 283, 409 and 571 bits • Multi-precise arithmetic of such long bit-widths P60

  4. Elliptic Curve Arithmetic • For 192-bit operand, naïve M * P operation involves 191 elliptic curve doublings and 96 elliptic curve additions (ECC Adder) • ECC Addition • Given P1=(x1, y1, z1), P2=(x2,y2,z2), compute P3=(x3,y3,z3) such that • 14 high bit-width modular multiplications • 42 high bit-width multi precision multiplications if using Montgomery multiplication method P60

  5. ECC Adder Data Flow Graph

  6. StarBridge High Performance Computing Platform

  7. StarBridge HC-36 System • 4 Processing Elements • Virtex II 6000 (Processing elements) • 66Mhz PCI Bus • PE-PE communication rates • 50 bits/cycle • Development Environment • Viva P60

  8. Challenges Search for the optimal or near optimal design solution such that it can optimize the ECC Adder performance under the resource constraints (slices, number of built-in hardware multipliers, communication rate, etc ) of the target architecture (SBS HC36). The size of the design space can easily exceed 2120 evenwith a conservative estimation. P60

  9. Hierarchical DesignMethodology

  10. Rapid and accurate performance/cost evaluation is the key for effective and efficient design space exploration, and the performance/cost of the multipliers are critical for performance/cost of the ECC Adder. P60

  11. Evaluation of Timing and Resource Usage of a Multiplier • Different Multiplier implementation • Shift-and-Add, Divide-and-Conquer(D&Q), “Broadcast” (BC), etc • Performance/cost trade off • Hybrid multiplier • A multiplier combining different implementation strategies P60

  12. Divide & Conquer Multiplier • Karatsuba-Ofman Algorithm (1962) P60

  13. “Broadcast” Multiplier • Algorithm • Features • Shuffling the partial product for fully pipelined implementation • Given k functional units, each “loop body” can be computed in parallel • Easy tradeoff of resource usage/speed by selecting k • k=N: Shift-and-add (low degree of parallelism, low speed, low resource usage) • k = 1, 2, 3, … (small integer) : Conventional “block” multiplications (high degree of parallelism, high speed, high resource usage) • Good scalability P60

  14. Example: 192-bit “Broadcast” Multiplier P60

  15. The Hybrid Multiplier • A hybrid multiplier is denoted by a integer string, M(N) = {m1,m2,…,mn } • mi: the multiplier scheme at ith level • mi = 1, using D&Q scheme • mi = k (k>1), using BC scheme with k sub multipliers • for multiplication with bit width less than 18 bit, the build-in hardware multiplier (18x18) is used P60

  16. An Example of Hybrid Multiplier • An 192 hybrid multiplier M(192)={ 1, 1, 3} • At the first level, D&Q scheme is adopted which requires three 96-bit multipliers • For each of the 96-bit multipliers (the 2nd level), the D&Q scheme is adopted again • For each of the 48-bit multipliers (the 3nd level), the BC scheme with three 16-bit multipliers is used • The hardware multipliers (18 bit) built in Virtex II 6000 are used for the 16-bit multiplications P60

  17. The First Level of M(192)={1,1,3} D&Q is used which requires three 96-bit multipliers

  18. The Second Level of M(192)={1,1,3} D&Q is used again which requires three 48-bit multipliers

  19. The Third Level of M(192)={1,1,3} BC sheme with three 16-bit multipliers is used

  20. One “loop” for 48-bit BC

  21. Analytical Cost Estimation for the Hybrid Multiplier • Area Estimation • Si(N): area cost for N-bit multiplier • SOD&Q(N): area cost for the overhead in D&Q implementation of N-bit multiplier (for control and other units such as adders) • SOBC(N): area cost for the overhead in BC implementation of N-bit multiplier P60

  22. It is reasonable to assume that SOD&Q(N) and SOBC(N) are linear to N. Therefore, SOD&Q(N) =ax N + const1 SOBC(N) = bx N + const2 Empirically, we have a = 15, b = 11, and const1 = const2 = 0. P60

  23. Analytical Cost Estimation for the Hybrid Multiplier • Timing estimation • Ti(N): timing cost for N-bit multiplier • Tadd(N): timing cost for N-bit addition • TOD&Q(N): timing cost for the control in N-bit D&Q multiplier • TCBC(k): timing cost for the control with k base units in BC implementation • TBC(k): timing cost for “loop” overhead with k base units in BC implementation P60

  24. With the given Viva design library, we have, for N < 192, k > 1, TOD&Q (N) = 3 TCBC (k) = 2 TBC(k) = k P60

  25. Comparison of Analytical and Actual Results P60

  26. Summary • Rapid estimation of the design cost for the hybrid multiplier architecture • With given Viva library, we are able to estimate the cycle number of a hybrid multiplier accurately • The relative error for the area estimation is within 5% • Future • Estimation of communication cost • Investigation of efficient hierarchical allocation/partition/mapping/scheduling techniques P60

More Related