1 / 20

Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002

Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002. Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13. Administrivia & Overview. Don’t forget to keep up with homework! We are  8 out of 14 weeks into the course. You should have earned ~ 57 points by now.

darioe
Download Presentation

Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Physical Limits of ComputingDr. Mike Frank CIS 6930, Sec. #3753XSpring 2002 Lecture #24Adiabatic CMOS cont.Wed., Mar. 13

  2. Administrivia & Overview • Don’t forget to keep up with homework! • We are 8 out of 14 weeks into the course. • You should have earned ~57 points by now. • Course outline: • Part I&II, Background, Fundamental Limits - done • Part III, Future of Semiconductor Technology - done • Part IV, Potential Future Computing Technologies - done • Part V, Classical Reversible Computing • Fundamentals of Adiabatic Processes & logic - last Wed. & Fri.(----------------------- Spring Break ------------------------) • Adiabatic electronics & CMOS logic families, - Mon. & TODAY • Limits of adiabatics: Leakage and clock/power supplies. TODAY • RevComp theory I: Emulating Irreversible Machines - Fri. 3/15 • RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18 • (plus ~7 more lectures…) • Part VI, Quantum Computing • Part VII, Cosmological Limits, Wrap-Up

  3. Adiabatic computing in CMOS Monday: Adiabatic switching, split-level retractile & pipelined logic. Today: 2-Level Adiabatic Logic, general adiabatic logic

  4. Some Timing Terminology For sequential adiabatic circuits: • Tick: Time for a single ramp transition • adiabatic speed fraction f times the RC gate delay. • Phase: Latency for a data value to propagate forward by 1 pipeline stage. • Cycle: Minimum period for all timing information to return to its initial state. • Diadic: Two retractile levels per gate • permits inverting or non-inverting logic. • Dual rail: Two wires per logic value • permits universal logic with monodic gates Monadic:only 1 level

  5. Some Figures of Demerit • Some quantities we may wish to minimize: • Ticks/phase: • proportional to logic propagation latency • Ticks/cycle: • reciprocal to rate of data throughput • Transistor-ticks/cycle: • reciprocal to HW cost-efficiency • Number of required clock/power input signals: • supplying these may be a significant component of system cost • Number of distinct voltage levels required: • may affect reliability/power tradeoff

  6. Some Interesting Questions • About pipelined, sequential, fully-adiabatic CMOS logic: • Q: Does it require an intermediate voltage level? • A: No, you can get by with only 2 different levels. • Q: What is the minimum number of externally provided timing signals you can get away with? • A: 4 (12 if split levels are used) • Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? • A: Yes, but not statically, unless N2 hardware is used • where N is the number of stages per full sequential cycle • We now demonstrate these answers.

  7. Some Timing Examples See next slide for some detailed timing diagrams. • N-level retractile cascades: • 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle • 3-phase fully-static diadic SCRL • 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle • 2-phase fully-static monadic SCRL • 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle • 2-phase fully-static diadic SCRL • 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle • 6 tick/cycle dynamic SCRL detailed previously: • 1 tick/phase × 6 phases/cycle = 6 ticks/cycle

  8. Some SCRL timing diagrams

  9. P 2LAL: 2-level Adiabatic Logic P P • Dual-rail T-gate symbol: • Basic buffer element: • cross-coupled T-gates • Only 4 differenttiming signals,4 ticks per cycle: • i rises during tick i, falls during tick (i+2) mod 4 • 1 tick/phase × 4 phases/cycle = 4 ticks/cycle! • Optimizes latency & throughput per gate. B A B A : 1 in P out B 0 A Tick # 0 1 2 3 0 P 1 2 3

  10. 2LAL Cycle of Operation Tick number:0 1 2 3 11 in1 in0 10 out1 in 01 00 11 in=0 out0 out=0 01 00

  11. Input-Barrier, Clocked-Bias Latching (1) Input conditionally lowers barrier (logic w. series/parallel barriers) (2) Clock applies bias force; conditional bit flip (3) Input removed, raising barrier & locking in state-change (4) Clock bias can retract. 1 1 1 2LAL is anexample ofthis. 0 0 0 Input pulse Pulse ends N 1 0

  12. Shift Register Structure • 1-tick delay per logic stage: • Logic pulse timing & propagation: 2 3 4 1 in out 1 2 3 4 1 2 3 4 ... 1 2 3 4 ... in in

  13. More complex logic functions • Non-inverting Boolean functions: • For inverting functions, must use quad-rail logic encoding: • To invert, justswap the rails! • Zero-transistor“inverters.”   A B A A B AB AB A = 0 A = 1 A0 A0 A1 A1

  14. Hardware Efficiency issues • Hardware efficiency: How many logic operations per unit hardware per unit time? • Hardware spacetime complexity: How much hardware for how much time per logic op? • We’re interested in minimizing:(# of transistors) × (# of ticks) / (gate cycle) • SCRL inverter, w. return path: • (8 transistors)  (6 ticks) = 48 transistor-ticks • Quad-rail 2LAL buffer stage: • (16 transistors)  (4 ticks) = 64 transistor-ticks

  15. More SCRL vs. 2LAL • SCRL reversible NAND, w. all inverters: • (23 transistors)  (6 ticks) = 138 T-ticks • Quad-rail 2LAL AND: • (48 transistors)  (4 ticks) = 192 T-ticks • Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. • The question of whether 6-tick SCRL really minimizes per-op spacetime complexity among pipelined fully-adiabatic CMOS logics is still open. • An opportunity for you to make a contribution!

  16. Minimizing Power-Clock Signals • How many external clock signals required? • N-level-deep retractile cascade logic: • 2N waveforms × 1 phase = 2N signals • 6 tick/cycle, 6-phase dynamic SCRL: • 6 waveforms × 6 phases = 36 signals • 24 tick/cycle, 3-phase static SCRL: • 12 waveforms × 3 phases = 36 signals • 4 tick/cycle, 2LAL: • 1 waveform × 4 phases = 4 signals! • It turns out that 12 signals are sufficient to implement any combination of 2-level or 3-level logics (including retractile) on-chip!

  17. How to Do It • Circular 2LAL shifter; pulse-gated clocks Tick # 0 1 2 3 P0 P1 P2 P3 P0 P1 in 0 P2 P3 out P0 P1 P2 P3 0 2 2 1 2 3 2

  18. 12-rail system: pros & cons • Pros: • Completely solves adiabatic timing design problem • Enables mixtures of retractile, SCRL, and other logic styles on 1 chip • Enables simple fully-adiabatic SRAM & DRAM • Cons: • Timing signals are dynamic • Known fully-static alternatives use order N2 gates and signals for N-tick-long cycles • N can be large in a chip that includes deep retractile networks • Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present (SOI reduces these parasitics)

  19. Fully-Adiabatic DRAM cell • 6T, 6 lines/row, 1 line/column (in/out together) • Read cycle: • Initially:  lines neutral, out neutral, R off • R for desired row turns on •  for desired row splits, driving out column • R turns off, out is read •  merges, out is reset • Write cycle: • First, do read cycle. • in is set to out • W turns on • in changed to new value...

  20. Fully-Adiabatic SRAM • 10-T, 10 lines/row, 1 line/column • Operation similar to DRAM, except: • Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off • Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; inchanges; T1 off, N1asserts; N2 asserts; T2 on N1 N2 M T1 T2 T3 out in

More Related