Low latency interfaces for mixed timing domains in dac 01
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01]. Tiberiu ChelceaSteven M. Nowick Department of Computer Science Columbia University {tibi,[email protected] Introduction. Key Trend in VLSI systems: systems-on-a-chip (SoC) Two fundamental challenges:

Download Presentation

Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Low latency interfaces for mixed timing domains in dac 01

Low-Latency Interfaces for Mixed-Timing Domains[in DAC-01]

Tiberiu ChelceaSteven M. Nowick

Department of Computer Science

Columbia University

{tibi,[email protected]


Introduction

Introduction

Key Trend in VLSI systems: systems-on-a-chip (SoC)

Two fundamental challenges:

  • mixed-timing domains

  • long interconnect delays

    Our Goal: design of efficient interface circuits

    Desirable Features:

  • arbitrarily robust

  • low-latency, high-throughput

  • modularity, scalability

    Few satisfactory solutions to date….


Timing issues in soc design

Timing Issues in SoC Design

(a) single-clock

(b) mixed-timing domains

sync or async

Domain #1

Domain #1

longinter-

connect

longinter-

connect

Domain #2

sync or async

Domain #2


Timing issues in soc design cont

Timing Issues in SoC Design (cont.)

Solution: provide interface circuits

(a) single-clock

(b) mixed-timing domains

sync or async

Domain #1

Domain #1

longinter-

connect

longinter-

connect

sync or async

Domain #2

Domain #2

Carloni et al., “relay stations”

NEW: “mixed-timingFIFO’s”

NEW: “mixed-timing“relay stations”


Contributions

Contributions

Complete set of mixed-timing interface circuits:

  • sync-sync, async-sync, sync-async, async-async

    Features:

  • Arbitrary Robustness: wrt synchronization failures

  • High-Throughput:

    • in steady-state operation: no synchronization overhead

  • Low-Latency:“fast restart”

    • in empty FIFO: only synchronization overhead

  • Reusability:

    • each interface partitioned into reusable sub-components

      Two Contributions:

  • Mixed-Timing FIFO’s

  • Mixed-Timing Relay Stations


Contribution 1 mixed timing fifo s

Contribution #1: Mixed-Timing FIFO’s

Addresses issue of interfacing mixed-timing domains

Features: token ring architecture

  • circular array of identical cells

  • shared buses: data + control

  • data: “immobile” once enqueued

  • distributed control: allows concurrent put/get operations

    2 circulating tokens: define tail & head of queue

    Potential benefits:

  • low latency

  • low power

  • scalability


Contribution 2 mixed timing relay stations

Contribution #2: Mixed-Timing Relay Stations

Addresses issue of long interconnect delays

“Latency-Insensitive Protocols”: safely tolerate long interconnect delays between systems

Prior Contribution: introduce “relay stations”

  • single-clock domains (Carloni et al., ICCAD-99)

    Our Contribution: introduce “mixed-timing relay stations”

  • mixed-clock (sync-sync)

  • async-sync

    First proposed solutions to date….


Related work

Related Work

Single-Clock Domains: handling clock discrepancies

  • clock skew and jitter (Kol98, Greenstreet95)

  • long interconnect delays (Carloni99)

    Mixed-Timing Domains: 3 common approaches

  • Use “Wrapper Logic”:

    • add logic layer to synchronize data/control(Seitz80, Seizovic94)

    • drawback:long latencies in communication

  • Modify Receiver’s Clock:

    • stretchable and pausible clocks (Chapiro84, Yun96, Bormann97, Sjogren/Myers97)

    • drawback: penalties in restarting clock


Related work closer approaches

Related Work: Closer Approaches

Mixed-Timing Domains (cont.):

  • Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. 1997):

    • drawback: significant area overhead = synchronizerfor each cell

      Our approach: mixed-clock FIFO’s

    • … only 2 synchronizers for entire FIFO


Outline

Outline

  • Mixed-Clock Interfaces

    • FIFO

    • Relay Station

  • Async-Sync Interfaces

    • FIFO

    • Relay Station

  • Results

  • Conclusions


Mixed clock fifo block level

Initiates put operations

Indicates data items validity

(always 1 in this design)

Initiates get operations

Indicates when FIFO full

Bus for data items

Indicates when FIFO empty

Bus for data items

Controls put operations

Controls get operations

Mixed-Clock FIFO: Block Level

full

req_get

valid_get

req_put

Mixed-Clock

FIFO

synchronous

put inteface

synchronous

get interface

empty

data_put

data_get

CLK_put

CLK_get


Mixed clock fifo steady state simulation

Sender starts a put operation

Put Controller enables a put operation

FIFO not full

TAIL

Cell enqueues data

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Steady-State Simulation

At the end of clock cycle

Steady state: FIFO neither full, nor empty

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo steady state simulation1

Passes the put token

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Steady-State Simulation

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo steady state simulation2

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Steady-State Simulation

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty

Get Operation


Mixed clock fifo steady state simulation3

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Steady state operation: Puts and Gets “reasonably spaced”

Zero probability of synchronization failure

Steady state operation:

Zero synchronization overhead

Mixed-Clock FIFO: Steady-State Simulation

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo steady state simulation4

TAIL

TAIL

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Steady-State Simulation

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo full scenario

Put interface stalled

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Full Scenario

FIFO FULL

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo full scenario1

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Full Scenario

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo full scenario2

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Full Scenario

FIFO NOT FULL

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo full scenario3

TAIL

Full Detector

Put

Controller

Get

Controller

Empty Detector

HEAD

Mixed-Clock FIFO: Full Scenario

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Mixed clock fifo cell implementation

Data item in

En

Enables a put operation

Validity bit in

Synchronous Put Part

reusable

reusable

En

en_put

req_put

data_put

ptok_out

gtok_out

gtok_in

ptok_in

En

Data Validity

Controller

Status Bits:

f_i

Cell FULL

SR

e_i

Cell EMPTY

En

valid

data_get

en_get

Synchronous Get Part

Data item out

Enables a get operation

Validity bit out

Mixed-Clock FIFO: Cell Implementation

CLK_put

en_put

req_put

data_put

ptok_out

ptok_in

f_i

REG

e_i

gtok_out

gtok_in

CLK_get

en_get

valid

data_get


Mixed clock fifo architecture

FIFO not full

Full Detector

Put

Controller

Get

Controller

Empty Detector

Mixed-Clock FIFO: Architecture

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Synchronization issues

Synchronization Issues

Challenge: interfaces are highly-concurrent

  • Global “FIFO state”: controlled by 2 different clocks

    Problem #1: Metastability

  • Each FIFO interface needs clean state signals

    Solution:Synchronize “full” & “empty” signals

  • “full” with CLK_put

  • “empty” with CLK_get

    Add 2 (or more) synchronizing latches to each signal

    Observable “full”/“empty”safely approximate true FIFO state


Synchronization issues cont

CLK_put

full

e_0

e_1

e_2

e_3

e_1

e_2

e_3

e_0

 Two consecutive

empty cells

=

FIFO not full

CLK_put

CLK_put

Synchronizing Latches

NO two consecutive

empty cells

Synchronization Issues (cont.)

Problem #2:FIFO now may underflow/overflow!

  • synchronizing latches add extra latency

    Solution: Modify definitions of “full” and “empty”

    New FULL:0 or 1 empty cells left

    New EMPTY:0 or 1 full cells left

New Full Detector


Synchronization issues cont1

Synchronization Issues (cont.)

Problem #3:Potential for deadlock

Scenario: suppose only 1 data item in quiescent FIFO

  • FIFO still considered “empty” (new definition)

  • Get interface: cannot dequeue data item!

    Solution:bi-modal “empty detector”, combines:

  • “New empty” detector (0 or 1 data items)

  • “True empty” detector (0 data items)

    Two results folded into single global “empty” signal


Synchronization issues avoiding deadlock

Combine into

global “empty”

Detects “new empty” (0 or 1 empty cells)

When NOT

reconfigured, use “oe”:

FIFO quiescent 

avoids deadlock

When reconfigured

use “ne”:

FIFO active 

avoids underflow

CLK_get

CLK_get

CLK_get

CLK_get

Detects “true empty” (0 empty cells)

Reconfigure whenever active

get interface

Synchronization Issues: Avoiding Deadlock

Bi-modal empty detection: select either ne or oe

CLK_get

ne

f_0

f_1

f_2

f_3

f_1

f_2

f_3

f_0

empty

en_get

CLK_get

oe

f_0

f_1

f_2

f_3

req_get


Mixed clock fifo architecture1

FIFO not full

Full Detector

Put

Controller

Get

Controller

Empty Detector

Mixed-Clock FIFO: Architecture

full

req_put

data_put

CLK_put

CLK_get

data_get

req_get

valid_get

empty


Put get controllers

Put Controller:

enables put operation

disabled when FIFOfull

Get Controller:

enables get operation

indicates when data valid

disabled when FIFOempty

Put/Get Controllers

en_get

req_get

en_put

full

req_put

valid_get

empty

valid


Outline1

Outline

  • Mixed-Clock Interfaces

    • FIFO

    • Relay Station

  • Async-Sync Interfaces

    • FIFO

    • Relay Station

  • Results

  • Conclusions


Relay stations overview

system 1 now sends “data packets” to system 2

system 1 sends “data items” to system 2

Delay = > 1 cycle

Delay = 1 cycle

RS

RS

RS

RS

Data Packet =

  • “stop” control = stopIn + stopOut

  • apply counter-pressure

  • result: stall communication

data item +

CLK

Steady State: pass data on every cycle

(either valid or invalid)

validity bit

Problem: Works only for single-clock systems!

Relay Stations: Overview

Proposed by Carloni et al. (ICCAD’99)

System 1

System 2


Relay stations implementation

MR

mux

switch

AR

Control

Relay Stations: Implementation

  • In normal operation:

    • packetIn copied to MR and forwarded onpacketOut

  • When stopped (stopIn=1):

    • stopOutraised on the next clock edge

    • extra packet copied to AR

packetIn

packetOut

stopOut

stopIn


Relay station vs mixed clock fifo

Steady state:always pass data

Data items: both valid & invalid

Stopping mechanism:stopIn & stopOut

Steady state:only pass data when requested

Data items:only valid data

Stopping mechanism: none (only full/empty)

Mixed-

Clock

FIFO

Relay

Station

Relay Station vs. Mixed-Clock FIFO

full

empty

validOut

validIn

stopOut

stopIn

req_put

req_get

dataOut

dataIn

dataIn

dataOut


Mixed clock relay stations mcrs

NEW

MCRS

RS

RS

RS

RS

CLK2

CLK1

Change ONLY Put and Get Controllers

full

req_get

stopOut

stopIn

valid_get

req_put

valid_get

valid_put

Mixed-Clock

FIFO

Mixed-Clock

Relay Station

empty

packetIn

packetOut

data_put

CLK1

CLK2

data_get

data_put

data_get

CLK_put

CLK_get

Mixed-Clock Relay Stations (MCRS)

System 1

System 2

CLK

Mixed-Clock Relay Station derived from the Mixed-Clock FIFO


Mixed clock relay station implementation

Identical:

- FIFO cells

- Full/Empty detectors(...or can simplify)

Only modify: Put & Get Controllers

Always enqueue data (unless full)

Mixed-Clock Relay Station: Implementation

Mixed-Clock Relay Station vs. Mixed-Clock FIFO

en_get

stopIn

en_put

full

validOut

empty

validIn

to cells

valid

Put Controller

Get Controller


Outline2

Outline

  • Mixed-Clock Interfaces

    • FIFO

    • Relay Station

  • Async-Sync Interfaces

    • FIFO

    • Relay Station

  • Results

  • Conclusions


Async sync fifo block level

Async-Sync FIFO: Block Level

Asynchronous put interface: uses handshaking communication

  • put_req: request operation

  • put_ack: acknowledge completion

  • no “full” signal

    Synchronous get interface: no change

req_get

req_get

full

put_req

valid_get

valid_get

req_put

put_ack

Mixed-Clock

FIFO

Async-Sync

FIFO

empty

empty

data_put

data_get

put_data

data_get

CLK_put

CLK_get

CLK_get

Async Domain

Sync Domain


Async sync fifo architecture

No Full Detector or Put Controller

When FIFO full, acknowledgement withheld

until safe to perform the put operation

Asynchronous put interface

Get

Controller

Empty Detector

Get interface: exactly as in Mixed-Clock FIFO

Async-Sync FIFO: Architecture

put_ack

put_req

put_data

cell

cell

cell

cell

cell

CLK_get

data_get

req_get

valid_get

empty


Async sync fifo cell implementation

Asynchronous Put Part

Data Validity

Controller

reusable

C

OPT

+

from async

FIFO (Async00)

new

DV

En

reusable

(from mixed-clock FIFO)

Synchronous Get Part

Async-Sync FIFO: Cell Implementation

put_ack

put_req

put_data

we

we1

e_i

REG

f_i

gtok_in

gtok_out

CLK_get

en_get

get_data


Async sync relay stations asrs

System 1

(async)

System 2

(sync)

ARS

ARS

RS

Async-Sync Relay Stations (ASRS)

Micropipeline

ASRS

optional

CLK2


Outline3

Outline

  • Mixed-Clock Interfaces

    • FIFO

    • Relay Station

  • Async-Sync Interfaces

    • FIFO

    • Relay Station

  • Results

  • Conclusions


Results

Results

Each circuit implemented:

  • using both academic and industry tools

    • MINIMALIST: Burst-Mode controllers [Nowick et al. ‘99]

    • PETRIFY: Petri-Net controllers [Cortadella et al. ‘97]

      Pre-layout simulations: 0.6m HP CMOS technology

      Experiments:

  • various FIFO capacities (4/8/16 cells)

  • various data widths (8/16 bits)


Results latency

Results: Latency

Experimental Setup:

- 8-bit data items

- various FIFO capacities (4, 8, 16)

Latency = time from enqueuing to dequeueing data into

an empty FIFO

For each design, latency not uniquely defined: Min/Max


Results maximum operating rate

Results: Maximum Operating Rate

Synchronous interfaces: MegaHertz

Asynchronous interfaces: MegaOps/sec

Put vs. Get rates:

- sync put faster than sync get

- async put slower than sync get


Conclusions

Conclusions

Introduced several new low-latency interface circuits

Address 2 major issues in SoC design:

  • Mixed-timing domains

    • mixed-clock FIFO

    • async-sync FIFO

  • Long interconnect delays

    • mixed-clock relay station

    • async-sync relay station

      Other designs implemented and simulated:

  • Sync-Async FIFO + Relay Station

  • Async-Async FIFO + Relay Station

    Reusable components: mix & match to build circuits

    Provide useful set of interface circuits for SoC design


  • Login