ece 300 advanced vlsi design fall 2006 lecture 1 9 memories l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ECE 300 PowerPoint Presentation
Download Presentation
ECE 300

Loading in 2 Seconds...

play fullscreen
1 / 36

ECE 300 - PowerPoint PPT Presentation


  • 243 Views
  • Uploaded on

Digital Integrated Circuits Chpt. 12. Decreasing Word Line Delay. Drive ... Digital Integrated Circuits Chpt. 12. Pulsed Word Line Feedback Signal ...

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ECE 300' - Kelvin_Ajay


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ece 300 advanced vlsi design fall 2006 lecture 1 9 memories

ECE 300Advanced VLSI DesignFall 2006 Lecture 19: Memories

Yunsi Fei

[Adapted from Jan Rabaey et al’s Digital Integrated Circuits©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]

review basic building blocks
Review: Basic Building Blocks
  • Datapath
    • Execution units
      • Adder, multiplier, divider, shifter, etc.
    • Register file and pipeline registers
    • Multiplexers, decoders
  • Control
    • Finite state machines (PLA, ROM, random logic)
  • Interconnect
    • Switches, arbiters, buses
  • Memory
    • Caches (SRAMs), TLBs, DRAMs, buffers
a typical memory hierarchy
A Typical Memory Hierarchy
  • By taking advantage of the principle of locality:
    • Present the user with as much memory as is available in the cheapest technology.
    • Provide access at the speed offered by the fastest technology.

On-Chip Components

Control

eDRAM

Secondary

Memory

(Disk)

Instr

Cache

Second

Level

Cache

(SRAM)

ITLB

Main

Memory

(DRAM)

Datapath

Data

Cache

RegFile

DTLB

Speed (ns): .1’s 1’s 10’s 100’s 1,000’s

Size (bytes): 100’s K’s 10K’s M’s T’s

Cost: highest lowest

1d memory architecture
1D Memory Architecture

m bits

m bits

S0

S0

Word 0

Word 0

S1

S1

Word 1

Word 1

S2

S2

Word 2

Storage

Cell

A0

Word 2

Storage

Cell

S3

S3

A1

Decoder

n words

Ak-1

Sn-2

Sn-2

Word n-2

Word n-2

Sn-1

Sn-1

Word n-1

Word n-1

Input/Output

Input/Output

n words  n select signals

Decoder reduces # of inputs

k = log2 n

2d memory architecture
2D Memory Architecture

bit line

2k-j

word line

Aj

Aj+1

Row Address

storage (RAM) cell

Row Decoder

Ak-1

m∙2j

Column Address

A0

selects appropriate word from memory row

A1

Column Decoder

Aj-1

Sense Amplifiers

amplifies bit line swing

Read/Write Circuits

Input/Output (m bits)

3d memory architecture
3D Memory Architecture

Row Addr

Column Addr

Block Addr

Input/Output (m bits)

Advantages:

1. Shorter word and/or bit lines

2. Block addr activates only 1 block saving power

precharged mos nor rom
Precharged MOS NOR ROM

Vdd

0

 1

precharge

WL(0)

GND

0

 1

WL(1)

WL(2)

GND

WL(3)

BL(0)

BL(1)

BL(2)

BL(3)

1 1 1 1

0 1 1 0

mos nor rom layout
MOS NOR ROM Layout

Metal1 on top of diffusion

WL(0)

GND (diffusion)

WL(1)

Basic cell

10 x 7 

Metal1

Polysilicon

WL(2)

GND (diffusion)

WL(3)

BL(0)

BL(1)

BL(2)

BL(3)

Only 1 layer (contact mask) is used to program memory array, so programming of the ROM can be delayed to one of the last process steps.

transient model for nor rom
Transient Model for NOR ROM

precharge

metal1

rword

BL

poly

Cbit

WL

cword

Word line parasitics

Resistance/cell: 35

Wire capacitance/cell: 0.65 fF

Gate capacitance/cell: 5.10 fF

Bit line parasitics

Resistance/cell: 0.15

Wire capacitance/cell: 0.83 fF

Drain capacitance/cell: 2.60 fF

propagation delay of nor rom
Propagation Delay of NOR ROM
  • Word line delay
    • Delay of a distributed rc-line containing M cells

tword = 0.38(rword x cword) M2

= 20 nsec for M = 512

  • Bit line delay
    • Assuming min size pull-down and 3*min size pull-up with reduced swing bit lines (5V to 2.5V)

Cbit = 1.7 pF and IavHL = 0.36 mA so

tHL = tLH = 5.9 nsec

read write memories rams
Read-Write Memories (RAMs)
  • Static – SRAM
    • data is stored as long as supply is applied
    • large cells (6 fets/cell) – so fewer bits/chip
    • fast – so used where speed is important (e.g., caches)
    • differential outputs (output BL and !BL)
    • use sense amps for performance
    • compatible with CMOS technology
  • Dynamic – DRAM
    • periodic refresh required
    • small cells (1 to 3 fets/cell) – so more bits/chip
    • slower – so used for main memories
    • single ended output (output BL only)
    • need sense amps for correct operation
    • not typically compatible with CMOS technology
memory timing definitions
Memory Timing Definitions

Read Cycle

Read

Read Access

Read Access

Write Cycle

Write

Write Hold

Write Setup

Data

Data Valid

4x4 sram memory
4x4 SRAM Memory

read precharge

2 bit words

bit line precharge

enable

WL[0]

!BL

BL

A1

WL[1]

Row Decoder

A2

WL[2]

WL[3]

Column Decoder

A0

clocking and control

sense amplifiers

write circuitry

BL[i]

BL[I+1]

2d memory configuration
2D Memory Configuration

Sense Amps

Sense Amps

Row Decoder

decreasing word line delay
Decreasing Word Line Delay

driver

driver

polysilicon word line

WL

metal word line

polysilicon word line

WL

metal bypass

  • Drive the word line from both sides
  • Use a metal bypass
  • Use silicides
decreasing bit line delay and energy
Decreasing Bit Line Delay (and Energy)
  • Reduce the bit line voltage swing
    • need sense amp for each column to sense/restore signal
  • Isolate memory cells from the bit lines after sensing (to prevent the cells from changing the bit line voltage further) - pulsed word line
    • generation of word line pulses very critical
      • too short - sense amp operation may fail
      • too long - power efficiency degraded (because bit line swing size depends on duration of the word line pulse)
    • use feedback signal from bit lines
  • Isolate sense amps from bit lines after sensing (to prevent bit lines from having large voltage swings) - bit line isolation
pulsed word line feedback signal
Pulsed Word Line Feedback Signal

Read

Word line

  • Dummy column
    • height set to 10% of a regular column and its cells are tied to a fixed value
    • capacitance is only 10% of a regular column

Bit lines

Dummy

bit lines

Complete

10%

populated

pulsed word line timing
Pulsed Word Line Timing
  • Dummy bit lines have reached full swing and trigger pulse shut off when regular bit lines reach 10% swing

Read

Complete

Word line

V = 0.1Vdd

Bit line

V = Vdd

Dummy bit line

bit line isolation
Bit Line Isolation

bit lines

V = 0.1Vdd

isolate

Read

sense

amplifier

sense

V = Vdd

sense amplifier outputs

6 transistor sram cell
6-transistor SRAM Cell

WL

M2

M4

Q

M6

M5

!Q

M1

M3

!BL

BL

sram cell analysis read
SRAM Cell Analysis (Read)

WL=1

M4

M6

!Q=0

M5

Q=1

M1

Cbit

Cbit

!BL=1

BL=1

Read-disturb (read-upset): must carefully limit the allowed voltage rise on !Q to a value that prevents the read-upset condition from occurring while simultaneously maintaining acceptable circuit speed and area constraints

sram cell analysis read24
SRAM Cell Analysis (Read)

WL=1

M4

M6

!Q=0

M5

Q=1

M1

Cbit

Cbit

!BL=1

BL=1

Cell Ratio (CR) = (WM1/LM1)/(WM5/LM5)

V!Q = [(Vdd - VTn)(1 + CR (CR(1 + CR))]/(1 + CR)

read voltages ratios
Read Voltages Ratios

Vdd = 2.5V

VTn = 0.5V

sram cell analysis write
SRAM Cell Analysis (Write)

WL=1

M4

M6

!Q=0

Q=1

M5

M1

!BL=1

BL=0

Pullup Ratio (PR) = (WM4/LM4)/(WM6/LM6)

VQ = (Vdd - VTn) ((Vdd – VTn)2 – (p/n)(PR)((Vdd – VTn - VTp)2)

write voltages ratios
Write Voltages Ratios

Vdd = 2.5V

|VTp| = 0.5V

p/n = 0.5

cell sizing
Cell Sizing
  • Keeping cell size minimized is critical for large caches
  • Minimum sized pull down fets (M1 and M3)
    • Requires minimum width and longer than minimum channel length pass transistors (M5 and M6) to ensure proper CR
    • But sizing of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle
  • Minimum width and length pass transistors
    • Boost the width of the pull downs (M1 and M3)
    • Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger
6t sram layout
6T-SRAM Layout

VDD

M2

M4

Q

Q

M1

M3

GND

M5

M6

WL

BL

BL

multiple read write port cell
Multiple Read/Write Port Cell

WL2

WL1

M2

M4

M5

!Q

Q

M6

M7

M8

M1

M3

!BL2

!BL1

BL1

BL2

4x4 dram memory
4x4 DRAM Memory

read precharge

2 bit words

bit line precharge

enable

WL[0]

BL

A1

WL[1]

Row Decoder

A2

WL[2]

WL[3]

sense amplifiers

clocking, control, and refresh

BL[0]

BL[1]

BL[2]

BL[3]

write circuitry

A0

Column Decoder

3 transistor dram cell
3-Transistor DRAM Cell

write

WWL

Vdd

BL1

X

Vdd-Vt

RWL

read

Vdd-Vt

BL2

V

WWL

RWL

M3

X

M1

M2

Cs

BL2

BL1

No constraints on device sizes (ratioless)

Reads are non-destructive

Value stored at node X when writing a “1” is VWWL - Vtn

3t dram layout
3T-DRAM Layout

BL2

BL1

GND

RWL

M3

M2

WWL

M1

1 transistor dram cell
1-Transistor DRAM Cell

WL

write

“1”

read

“1”

WL

X

M1

X

Vdd-Vt

Cs

CBL

Vdd

BL

Vdd/2

BL

sensing

Write: Cs is charged (or discharged) by asserting WL and BL

Read: Charge redistribution occurs between CBL and Cs

Read is destructive, so must refresh after read

dram cell observations
DRAM Cell Observations
  • DRAM memory cells are single ended (complicates the design of the sense amp)
  • 1T cell requires a sense amp for each bit line due to charge redistribution read
  • 1T cell read is destructive; refresh must follow to restore data
  • 1T cell requires an extra capacitor that must be explicitly included in the design
  • A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than Vdd)