1 / 16

COMP290-084 Clockless Logic and Silicon Compilers Lecture 3

COMP290-084 Clockless Logic and Silicon Compilers Lecture 3. Montek Singh Tue, Jan 24, 2006. Handshaking Example: Asynchronous Pipelines. Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines. Background: Pipelining. fetch. decode. execute.

dreagan
Download Presentation

COMP290-084 Clockless Logic and Silicon Compilers Lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP290-084Clockless Logic and Silicon CompilersLecture 3 Montek Singh Tue, Jan 24, 2006

  2. Handshaking Example:Asynchronous Pipelines Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines

  3. Background: Pipelining fetch decode execute A “coarse-grain” pipeline (e.g. simple processor) A “fine-grain” pipeline (e.g. pipelined adder) What is Pipelining?: Breaking up a complex operation on a stream of data into simpler sequential operations Storage elements(latches/registers) Performance Impact: + Throughput: significantly increased (#data items processed/second) – Latency:somewhat degraded (#seconds from input to output)

  4. Focus of Asynchronous Community A Key Focus: Extremely fine-grain pipelines • “gate-level” pipelining = use narrowest possible stages • each stage consists of only a single level of logic gates • some of the fastest existing digital pipelines to date Application areas: • general-purpose microprocessors • instruction pipelines: often 20-40 stages • multimedia hardware (graphics accelerators, video DSP’s, …) • naturally pipelined systems, throughput is critical; input “bursty” • optical networking • serializing/deserializing FIFO’s • string matching? • KMP style string matching: variable skip lengths

  5. MOUSETRAP: Ultra-High-SpeedTransition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001

  6. MOUSETRAP Pipelines Simple asynchronous implementation style, uses… • standard logic implementation: Boolean gates, transparent latches • simple control:1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … • are normally transparent: beforenew data arrives • become opaque: afterdata arrives (“capture” data) Control Signaling:transition-signaling = 2-phase • simple protocol: req/ack = only 2 events per handshake (not 4) • no “return-to-zero” • each transition (up/down) signals a distinct operation Our Goal: very fast cycle time • simple inter-stage communication

  7. MOUSETRAP: A Basic FIFO Stages communicate usingtransition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En doneN reqN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline

  8. MOUSETRAP: A Basic FIFO (contd.) Latch is disabled when current stage is “done” Latch is re-enabled when next stage is “done” Latch controller (XNOR) acts as “protocol converter”: • 2 distinct transitions (up or down)  pulsed latch enable Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN reqN+1 doneN Data in Data out Data Latch Stage N-1 Stage N Stage N+1

  9. MOUSETRAP: FIFO Cycle Time 3 Latch Controller 2 ackN-1 ackN En reqN reqN+1 doneN 1 2 Data in Data out Data Latch Fast self-loop: N disables itself Stage N-1 Stage N Stage N+1 Cycle Time = N re-enabled to compute N+1 computes N computes

  10. Detailed Controller Operation • One pulse per data item flowing through: • down transition:caused by“done” of N • up transition:caused by“done” of N+1 Stage N’s Latch Controller ackfrom N+1 donefrom N to Latch

  11. MOUSETRAP: Pipeline With Logic logic logic logic Logic Blocks:can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: • each“req”must arrive after data inputs valid and stable Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN+1 reqN delay delay delay doneN Data Latch Stage N-1 Stage N Stage N+1

  12. Complex Pipelining: Forks & Joins fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures • Forks: distributedata + controlto multiple destinations • Joins: mergedata + controlfrom multiple sources • Enabling technology for building complex async systems Problems with Linear Pipelining: • handles limited applications; real systems are more complex

  13. Forks and Joins: Implementation ack1 C ack ack2 req1 C req req req2 Stage N Stage N Join:merge multiple requests Fork:merge multiple acknowledges

  14. Performance, Timing and Optzn. Stage Latency = Cycle Time = MOUSETRAP with Logic:

  15. Timing Analysis Latch Controller ackN-1 ackN reqN+1 reqN delay delay doneN logic logic Data Latch Stage N Stage N-1 Main Timing Constraint: avoid “data overrun” Data must be safely “captured” by Stage N before new inputs arrive fromStage N-1 • simple 1-sided timing constraint: fast latch disable • Stage N’s “self-loop” faster than entire path through previous stage

  16. Experimental Results • Simulations of FIFO’s: • ~3 GHz (in 0.13u IBM process) • Recent fabricated chip: GCD • ~2 GHz simulated speed • chips awaited

More Related