- 66 Views
- Uploaded on
- Presentation posted in: General

Outline

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A 1.5 GHz AWPElliptic Curve Crypto ChipO. Hauck, S. A. HussICSLAB TU DarmstadtA. KatochPhilips Research

Current AWP projects

GATS-Chip

Elliptic Curve Chip

AWPs compared to sync wave pipes

SRCMOS circuits

Crypto background

Architecture and Implementation

Conclusion

2D-DCT:

0.6µm, being re-designed with self-resetting logic

SRT:

currently on schematics only

64b Giga-Hertz Adder Test Site:

0.6µm, almost complete, tape out in May

Crypto chip:

0.35µm, tape out in July targeted

AMS 0.6µm 3M CMOS

64b Brent-Kung adder

~10k devices, ~1.3sqmm

latency ~2.5ns

cycle 1.0ns

on-chip test circuitry

Latch/Reg

Latch/Reg

Logic

Data

Clk

Synchronous Wave Pipeline

Latch/Reg

Latch/Reg

Wave Logic

Data

Clk

Discrete, distinct valid frequency ranges

Low high narrow frequency range

not suitable for system design

Promise: higher throughput at reduced latency, clock load,

area and power

Drawback: difficult tuning of logic and delay elements

Latch/Reg

Latch/Reg

Logic

Data

Clk

Throughput determined by longest logic path +

clock/register overhead

Fine-grain pipelining allows high throughput at the cost of

increased clock/register overhead

Wave Latch

Wave Latch

Wave Logic

Data

req_in

req_out

matched delay

More than one data and request propagating coherently

One-sided cycle time constraint

Delay must track logic over PTV corners

0 1 2 3 4

pg

PG

PG

G

x

o

r

Buffers provide

for same depth

on every logic

path

All gates in the

same column

must have the

same delay

- Logic style used has to minimize delay variation
- Earlier work focused on bipolar logic (ECL, CML), but CMOS is mainstream
- Static CMOS is not well suited for wave piping, fixing the problem results in more power and slower speed
- Pass transistor logic gives slopy edges thereby introducing delay variation
- Dynamic logic is attractive as only output high transition is data-dependant, output pulldown is done by precharge
- What is needed is a dynamic logic family without precharge overhead: SRCMOS

- Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced

output

N

inputs

[Cisco Systems]

- Security based upon DLP: in a finite Abelian group we can easily compute given
- However, is hard to compute out of and
- DLP extraordinarily hard for point group of elliptic curve:
- Set of solutions of cubic equation over any field is an abelian group

- Two types - supersingular and non-supersingular
- Non-supersingular have the highest security
- EC equation:

- Pseudo NMOS

- SRCMOS

- 1

- 1

- 1

- 1

- 2

- 3

- abx

- 2

- 3_Xor

- Wave latch

- abx

- 1

- 3

- 1

- abx

- 3_Xor

- 3_Xor

- Wave latch

- 9

- 3_Xor

- 27

- 3_Xor

- 259

- 87

- 87

- 3_Xor

- abx

- 260

- 3_Xor

- Wave latch

- 29

- delay

- abx

- 261

- 781

- 782

- abx

- 3_Xor

- 783

- request

- delay

- Dual-rail cross-coupled SRCMOS circuit
- NMOS trees are designed such that there is only one conducting path to ground

left shift

260 0

x

k

Double-and-Add Key generation rate R

Hamming weight = 40

*(261*7+40*13)

If x=1

always

EC double

EC add

EC arithmetic R * 2347 MUL/s

7

13

* 261

Finite field arithmetic R * 612567 bit/s

ADD

MUL

LOAD/

STORE

1 261 1

- For static operation

- Request signals trigger the state transitions.
- Autonomous state transitions are triggered by signal X

- X

- R
- E
- G

- R
- E
- G

- OUT

- IN1

- Logic

- reset

- IN2

- req1

- Req_out

- AWP

- reqn

- Start/LoadX, ResetZ

- 1

- X=1

- 2

- X=0

- LoadY

- Shift K

- 3

- X=0

- X=1

- 4

- If Stop=1/KP_Done

- If K=0

- If K=1

- X=1

- 5

- ShiftK, Double

- 6

- X=1

- K=0,DoubleDone

- 8

- X=0

- K=1,DoubleDone/Add

- 7

- AddDone

- X=1

Level-based control

- X=0

- X=1

- Pulse-based control

- X=1

- X=1

- 0

- X=0

- 1

- X=1

- 2

- 3

- 4

- 5

- Start

- OPAX

- OPBZ

- MULT

- MD

- X=1

- X=1

- X=1

- 58

- X=1

- 59

- X=0

- 60

- OPAA

- X=1

- 61

- Shift

- 62

- OPBA

- 63

- MULT

- MD