1 / 19

Clockless Logic

Clockless Logic. Recap: Lookahead Pipelines High-Capacity Pipelines. Recap: Lookahead Pipeline Styles. 2 Strategies: Early Evaluation Early Done. Lookahead Pipelines: Strategy #1. Use non-neighbor communication: stage receives information from multiple later stages

nibaw
Download Presentation

Clockless Logic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clockless Logic Recap: Lookahead Pipelines High-Capacity Pipelines

  2. Recap: Lookahead Pipeline Styles 2 Strategies: Early Evaluation Early Done

  3. Lookahead Pipelines: Strategy #1 Use non-neighbor communication: • stage receives information from multiple later stages • allows “early evaluation” Benefit: stage gets head-start on next cycle

  4. Lookahead Pipelines: Strategy #2 Use early completion detection: • completion detector moved before stage (not after) • stage indicates“early done”in parallel with computation early completion detector Benefit: again, stage gets head-start on next cycle

  5. Single-Rail Styles matched delay request done request/done indicate valid data bit 1 bit 1 bit n bit m delay delay delay Adapt dual-rail styles to single-rail: • replace dual-rail function blocks by single-rail blocks • replace completion detectors by matched delays Example: LPsr2/2

  6. Single-Rail Styles (contd.) delay delay delay Example: LPsr2/1

  7. High-Capacity Pipelines Singh/Nowick WVLSI-00, ISSCC-02, Async-02

  8. Recent Approaches 3 novel styles for high-speed async pipelining: • “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] • “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] • MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal:significantly improve throughput of PS0 Two Distinct Strategies: • LP: introduce protocol optimizations • “shave off” components from critical cycle • HC: fundamentally new protocol • greater concurrency: “loosely-coupled” stages  

  9. High-Capacity Pipeline: HC stage controller pc eval ack delay delay delay Key Idea: Decouple control for pull-up and pull-down • increases pipeline concurrency  initiates next cycle early • once N+1 evaluates, can enter “isolate (hold) phase” • stage N allowed to complete entire next cycle! N N+1 N+2

  10. Inside an HC stage Decoupled control: pull-up and pull-down stacks are independently controllable: eval pc “keeper” precharge control Pull-down stack datainputs dataoutputs evaluation control • pcasserted: precharge • evalasserted: evaluate • both de-asserted: enter“isolate” (hold) phase

  11. Cycle of an LPHC Stage Eval Eval pc=1eval=1 Isolate Isolate pc=1eval=0 Precharge pc=0eval=0 Precharge • Only a singlebackward synchronization arc: • once stage N+1 has completed Eval, N can perform entire next cycle! • why safe?: N+1 enters isolate phase … key to greater concurrency • almost all existing approaches: require 2 arcs • One (natural) forward synchronization arc: • stage N+1 evaluates new data only after N has evaluated Stage N Stage N+1

  12. Formal Specification of Controller (Start evaluate) pc+ eval+ (Evaluate of N+1 complete) T+ (Evaluate complete) S+ eval- (Isolate) (Start precharge) pc- (Precharge of N+1 complete) T- (Precharge complete) S- Problem: Specification too concurrent for direct synthesis • desired precharge condition: N and N+1 have evaluated same data • problem: this condition not uniquely captured by given signals! • N may evaluate next data item,while N+1 stuck on current item!

  13. Modified Specification of Controller pc+ eval+ (Evaluate of N+1 complete) T+ S+ eval- T- (Precharge of N+1 complete) pc- ok2pc+ S- ok2pc- Solution: Add a state variable ok2pc ok2pc records whether N+1 has “absorbed” N’s data item • ok2pc resets immediately when N deletes item (N precharges) • ok2pc is set when N+1 deletes item (N+1 precharges)

  14. Controller implementation T Controller implementation is very simple: • each signal implemented using a single gate • ok2pc typically off the critical path S pc T NAND3 S aC + ok2pc eval S INV

  15. Performance 2 2 3 N isolates 1 Cycle Time = N N+1 N+2 N enables itself for next evaluation N precharges N evaluates N+1 evaluates

  16. Ripple-Carry Adder: One Stage A B a1 a0 b1 b0 reqab Carry-in reqc Full-Adder Stage Carry-out done cin1 cout1 cin0 cout0 sum Mixed Dual-Rail/Single-Rail Datapath: • single-rail: sum • dual-rail:A, B, Carry-in and Carry-out • must implement binate functions using unate dynamic logic

  17. Final Adder Architecture shift-registers provide operand bits A,B carryin adder stage carryout most significant least significant sum shift-registers accumulate sum bits

  18. Results Designed/simulated adder in each pipeline style Experimental Setup: • design: 32-bit ripple-carry-adder • technology: 0.6 HP CMOS, @3.3 V and 300°K New LPHC style: 10% faster than LPSR2/1

  19. Conclusions Introduced 2 new asynchronous adders: • Use novel pipeline protocols: • observe events from multiple later stages • decouple control of pull-up/pull-down • Especially suitable for fine-grain (gate-level) pipelining • Very high-throughputs obtained: • 0.93-1.02 GHz in 0.6 • expected to outperform the best (IPCMOS: 3.3-4.5 GHz / 0.18) • LPHC doubles the typical storage capacity • Robustly handle arbitrary-speed environments • useful as IP’s Future Work: Layout/fabrication, application to DSP’s

More Related