Clockless Logic in Asynchronous Pipeline Design for High Performance

Clockless Logic Montek Singh Tue, Mar 23, 2004

Outline • Classic static logic pipeline: Sutherland • Classic dynamic logic pipeline: Williams/Horowitz

A Classic AsynchronousDynamic Pipeline Williams and Horowitz’s PS0 pipeline: Structure Operation Performance

A Classic Approach: PS0 Pipeline Stage 2 Stage 3 Stage 1 ack Data in Data out data Processing Block Completion Detector Williams/Horowitz (Stanford U.) [1986-91]: • successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Implemented using “dynamic logic”

PS0 Pipeline Stage ack Completion Detector A PS0 stage consists of dynamic gates and a completion detector: PC “keeper” datainputs Pull-down network dataoutputs Processing Block

Dual-Rail Completion Detector bit0 bitn bit1 OR OR OR Done C • Combines dual-rail signals • Indicates when all bits are valid (or reset) C-element: • if all inputs=1, output  1 • if all inputs=0, output  0 • else, maintain output value • OR together 2 rails per bit • Merge results using “C-element”

PS0 Protocol 4 3 indicates “done” 6 5 1 2 3 • PRECHARGE N: when N+1 completes evaluation • delete data:after next stage has copied it • EVALUATE N: when N+1 completes precharging • accept new data: after next stage is emptied indicates “done” indicates “done” N N+1 N+2 precharges evaluates evaluates evaluates Complete cycle: 6 events Evaluate  Precharge: 3 events Precharge  Evaluate: another 3 events

PS0 Performance 6 4 Cycle Time = 5 1 2 3

Summary: PSO Pipelining Datapaths are latch-free: • dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control • stage deletes data:only afternext stage has copied it • stage accepts new data:only ifnext stage is empty • distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire • completion detector directly controls previous stage +: chip area savings +: low control overhead

Comparison to a Clocked Pipeline latch How would you design the pipeline if you actually had a clock? • Replace handshaking with “magic clocking” • each stage gets its own clock • successive clocks are slightly skewed • essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! • Use a single clock, but insert latches between stages • latches are simple, level-sensitive • consecutive stages receive complementary clock signals Ck Ck’

Comparison … (contd.) Cycle Times?

Drawbacks of PSO Pipelining • Poor throughput: • long cycle time: 6 events per cycle • data “tokens” are forced far apart in time • Limited storage capacity: • max only 50% of stages can hold distinct tokens • data tokens must be separated by at least one spacer Our Research Goals: address both issues • still maintain very low latency

Clockless Logic in Asynchronous Pipeline Design for High Performance

Clockless Logic in Asynchronous Pipeline Design for High Performance

Presentation Transcript

Clockless Chips

Clockless Logic

COMP290-084 Clockless Logic

Clockless Logic

Clockless Chips

Clockless Logic: Dynamic Logic Pipelines (contd.)

Clockless Logic: Asynchronous Pipelines

Clockless Logic

Clockless Chips

Clockless Logic

Clockless Logic

Clockless Computing

Clockless Logic

Clockless Computing

Clockless Logic

CLOCKLESS CHIPS

Clockless Logic

Clockless Computing

COMP290-084 Clockless Logic and Silicon Compilers Lecture 3