1 / 31

Emerging Technologies of Computation

Emerging Technologies of Computation. Montek Singh COMP790-084 Oct 27, 2011. Today: Basics of Asynchronous Design. Introduction to Asynchronous Design What is asynchronous design? Why do we want to do it? Data Representation and Communication

skah
Download Presentation

Emerging Technologies of Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emerging Technologies of Computation Montek Singh COMP790-084 Oct 27, 2011

  2. Today: Basics of Asynchronous Design • Introduction to Asynchronous Design • What is asynchronous design? • Why do we want to do it? • Data Representation and Communication • How is data represented in an asynchronous system? • How is information exchanged?

  3. Introduction: Clocked Digital Design clock Most current digital systems are synchronous: • Clock:a global signal that paces operation of all components Benefit of clocking: enables discrete-time representation • all components operate exactly once per clock tick • component outputs need to be ready by next clock tick • allows “glitchy” or incorrect outputs between clock ticks

  4. Microelectronics Trends Current and Future Trends: Significant Challenges • Large-Scale “Systems-on-a-Chip” (SoC) • 100 Million ~ 1 Billion transistors/chip • Very High Speeds • multiple GigaHertz clock rates • Explosive Growth in Consumer Electronics • demand for ever-increasing functionality … • … with very low power consumption (limited battery life) • Higher Portability/Modularity/Reusability • “plug ’n play” components, robust interfaces

  5. Challenges to Clocked Design Breakdown of Single-Clock Paradigm: • Chip will be partitioned intomultiple timing domains • challenge: gluing together multiple timing domains • glue logic is susceptible to “metastability” (=incorrect values transferred) and latency overheads Increasing Difficulties with Clocked Design: • Clock distribution: requires significant designer effort • Performance bottleneck: a single slow component • Clock burns large fraction of chip power (~40-70%) • Fixed clock rate: poor match for • designing reusable components • interfacing with mixed-timing environments

  6. What is Asynchronous Design? handshaking interface clock Synchronous System (Centralized Control) Asynchronous System (Distributed Control) • Digital design with no centralized clock • Synchronization using local “handshaking”

  7. Why Asynchronous Design? (1) • Higher Performance • May obtain “average-case” operation (not “worst-case”) • not limited by slowest component • Avoids overheads of multi-GHz clock distribution • Lower Power • No clock power expended • Inactive components consume negligible power • Better Electromagnetic Compatibility • Smooth radiation spectra: no clock spikes • Much less interference with sensitive receivers [e.g., Philips pagers, smartcards] • Greater Flexibility/Modularity • Naturally adapt to variable-speed environments • Supports reusable components

  8. Why Asynchronous Design? (2) • The world already is mostly asynchronous! • Events at the level of (or in between) large-scale systems are asynchronous • several seconds to several milliseconds • e.g., PC-printer communication, keyboard inputs, network comm. • Events at the board level (or between chips) are often asynchronous • milliseconds to 100 nanoseconds • e.g., CPU-memory interface, interface with I/O subsystem (interrupts) • Events within a chip, at the level of functional units (e.g., adders, control logic) are currently mostly synchronous • several nanoseconds to 100 picoseconds • Events at the level of a single logic gate are asynchronous • 10 picoseconds • Events at the quantum level are asynchronous • picoseconds to femtoseconds • So, why bother with clocks at all?! • make everything asynchronous  greater elegance and robustness

  9. Challenges of Asynchronous Design communication must be hazard-free! special design challenge =“hazard-free synthesis” Testability Issues: absence of clock means no “single-stepping” Lack of Commercial CAD Tools: chicken-and-egg problem clock tick no problemfor clockedsystems clean signals hazardous signals • Hazards: potential “glitches” on wire

  10. Asynchronous Design: Past & Present Async Design: In existence for 50 years, but … … many recent technical advances: • Hazard-Free Circuit Design: • several practical techniques for controllers [Stanford/Columbia] • Design for Testability: • several test solutions, e.g. Philips Research • Maturing Computer-Aided-Design (“CAD”) Tools: • software tools for automated design [Philips,Columbia,Manchester] • recent DARPA program [Boeing,Philips,UNC,Columbia,…] • Successful Fabricated Chips: • embedded processors, high-speed pipelines, consumer electronics…

  11. Recent Commercial Interest (1) Several commercial asynchronous chips: • Philips: asynchronous 80c51 microcontrollers • used in commercial pagers [1998] and smartcards [2001] • Univ. of Manchester: async ARM processor [2000] • Motorola: async divider in PowerPC chip [2000] • HAL: async floating-point divider • in HAL-I and II processors [early 1990’s] Recent experimental chips: • IBM, Sun and Intel: • fast pipelines, arbiters, instruction-length decoder… • IBM/Columbia/UNC: asynchronous digital FIR filter Several recent startups: • Handshake Solutions, Theseus Logic, Codetronix, Fulcrum, Silistix, …

  12. Recent Commercial Interest (2) Major DARPA program: • ~$13M • Goals: • commercial-strength automated CAD tool (=silicon compiler) • direct translation from algorithms to chip layout • capable of producing chips with 50M transistors or more • rich suite of analysis and optimization tools • demonstration chip • Boeing application • show dramatic improvements in: design time, power consumption, noise pollution, speed (?) • Team: • led by Boeing • async startups: Theseus, Handshake Solutions, Codetronix • universities: UNC, Columbia, UW, OrSU

  13. Data Representation and Communication

  14. A 5-minute Homework Problem Alice Bob Alice and Bob live on opposite sides of a wide river: Aliceis supposed to send a message (say, a “Yes”/”No”) across to Bob around midnight. Both have flashlights, but neither owns a watch. What should they do? Suggest several strategies, and discuss pros and cons of each.

  15. Solution 1 got it yes/no ready Aliceuses 2 lamps: • 1 to indicate that she is ready with the message, and • 1 for the message itself Bobuses 1 lamp: • to indicate that he has received the message Alice Bob

  16. Solution 2 got it yes no Aliceuses 2 lamps: • Green lamp to indicate “yes” • Red lamp to indicate “no” Bobuses 1 lamp: • to indicate that he has received the message Alice Bob

  17. Solution 3 What if Alice and Bob could keep time? Aliceuses 1 lamp for the message: • At 12 midnight: turns on lamp if message = “yes” • At 12:01: turns lamp off Bobneeds no lamps! • Takes down the message between 12 and 12:01 Pros: Fewer signals, lesser processing needed Cons: Alice and Bob must keep their clocks closely synchronized • If Bob’s watch is off by a minute, incorrect communication possible

  18. Homework! • Think of all scenarios in which Solution #1 can fail • Are any of those scenarios a problem for Solution #2 as well?

  19. Data Representation and Communication How is data represented in an asynchronous system? How is information exchanged?: control signaling (handshake styles)

  20. Data Encoding: “Bundled Data” matched delay request done bit 1 bit 1 done indicates valid data bit n bit m functionblock Single-rail “Bundled Datapath”: simplest approach • widely used Features: • datapath: 1 wire per bit (e.g. standard sync blocks) • matched delay: produces delayed “done” signal • worst-case delay: longer than slowest path • Practical style: can reuse sync components; small area • Fixed (worst-case) completion time

  21. Bundled Data: Completion Sensing request done MUX bank of delays delayselector Delay Matching: • either single worst-case delay • or, fine-grain delay Speculative completion: • choose delay “on the fly” • start with shortest delay; increase as needed

  22. Data Encoding: Dual-Rail bit 1 bit 1 bit n bit m Dual-rail: uses 2 wires per data bit Each Dual-Rail Pair: provides both data value and validity • provides robust data-dependent completion • needs completion detectors

  23. Dual-Rail: Completion Sensing bit0 bitn bit1 OR OR OR Done C Dual-Rail Completion Detector: • combines dual-rail signals • indicates when all bits are valid (or reset) C-element: • if all inputs=1, output  1 • if all inputs=0, output  0 • else, maintain output value • OR together 2 rails per bit • Merge results using a Müller “C-element”

  24. Handshaking Styles: 4-phase get ready for next event start event Request ready for next event event done Acknowledge 4-Phase: requires 4 events per handshake • “Level-sensitive” simpler logic implementation • Overhead of “return-to-zero” (RTZ or resetting) • extra events which do no useful computation

  25. Handshaking Styles: 2-phase start next event start event Request next event done event done Acknowledge 2-Phase: requires 2 events per handshake • a.k.a. transition signaling • Elegant: no return-to-zero • Slower logic implementation: • logic primitives are inherently level-sensitive, not event-based (at least in CMOS)

  26. Handshaking Styles: Pulse Mode Pulse Mode: combines benefits of 2-phase and 4-phase • use pulses to represent events start next event start event Request next event done event done Acknowledge • No return-to-zero (like 2-phase) • Level-based implementation (like 4-phase) • Need a timing constraint on pulse width

  27. Handshaking Styles: Single-Track req req Request req + ack Acknowledge ack ack Single-Track: combines req and ack onto single wire! • one wire used for bidirectional communication • sender raises, receiver lowers • Efficient protocol: no return-to-zero, level-based • Need aggressive low-level design techniques • much effort to ensure reliability, satisfy timing constraints

  28. Handshaking + Data Representation bit 1 bit m ack Several combinations possible: • dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and single-rail 2-phase Example: dual-rail 4-phase • dual-rail data: functions as animplicit “request” • 4-phase cycle: between acknowledgeand implicit request A B

  29. Other Data Representation Styles data phase • Level-Encoded Dual-Rail (LEDR) • 2 wires per bit: “data” and “phase” • exactly one wire per bit changes value • if new value is different, “data” wire changes value • else “phase” wire change value • M-of-N Codes • N wires used for a data word • M wires (M <= N) change value • Values of N and M: have impact on… • information transmitted, power consumed and logic complexity • Knuth codes, Huffman codes, …

  30. Which to use? Depends on several performance parameters: • speed • single-rail vs. dual-rail • single-rail may be faster (if designed aggressively) • dual-rail may be faster (if completion times vary widely) • 2-phase vs. 4-phase • 2-phase may be faster (if logic overhead is small) • 4-phase may be faster (if overhead of return-to-zero is small) • power consumption • 2-phase typically has fewer gate transitions ( lower power) • amount of logic used (#gates/wires/pins  chip area) • single-rail needs fewer gates/wires/pins • design and verification effort • dual-rail, 1-of-N, M-of-N, Knuth codes…: • delay-insensitive: robust in the presence of arbitrary delays • single-rail: requires greater timing verification effort

  31. Homework! • Suppose you are given N wires • Which M-of-N encoding (i.e. what M) encodes most information? • Suppose you have to encode 4-bit values • Which M-of-N encoding yields fewest wires? • Suppose you can switch at most 2 wires • Which M-of-N encoding yields fewest wires for 4-bit values?

More Related