Sundar Iyer

1 / 23

# Sundar Iyer - PowerPoint PPT Presentation

Winter 2012 Lecture 7 Packet Buffers. EE384 Packet Switch Architectures. Sundar Iyer. The Problem. All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Sundar Iyer' - brit

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Winter 2012

Lecture 7

Packet Buffers

EE384

Packet Switch Architectures

Sundar Iyer

The Problem
• All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion.
• Size: A commonly used “rule of thumb” says that buffers need to hold one RTT (about 0.25s) of data. Even if this could be reduced to 10ms, a 4x10Gb/s linecard would require 400Mbits of buffering.
• Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). At 4x10Gb/s, minimum sized packets must arrive and depart every 8ns.

Write Rate, R

One 40B packet

every 8ns

One 40B packet

every 8ns

UnpredictableScheduler Requests

An ExamplePacket buffers for a 40Gb/s linecard

Buffer

Memory

Buffer Manager

Memory needs to be accessed for write or read every 4ns

Memory Operations Per Second (MOPS)

What is MOPS?

• Num. Unique Memory Operations Per Second
• Refers to the speed of the address (not data) bus
• Inverse of Random Access Time

Examples

• SRAM with 4ns access time = 250M MOPS
• DRAM with 50 ns access time = 20M MOPS
Memory Technology

Use SRAM?

+ Fast enough random access time, but

• Low density, high cost, high power.

Use DRAM?

+ High density means we can store data, but

• Can’t meet random access time.

SRAM (S)

FCRAM/RLDRAM (F)

XDRAM (X)

DDR3 (D)

25M MOPS

2c per Mb

3200 Mb/s per pin

800M MOPS

\$1 per Mb

800 Mb/s per pin

50M MOPS

4c per Mb

1000 Mb/s per pin

25M MOPS

1c per Mb

1600 Mb/s per pin

X

D

F

S

The Problem: No single memory technology is a good match

Ideal to have access/s of SRAM,

Cost & Density of DRAM

Sol 1: Can’t we just use lots of DRAMs as separate memories in parallel?

Read, write 40B every 4ns from a different ‘32ns access time’ memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

40B

40B

40B

40B

40B

40B

40B

40B

Solution

• Write 40B packets to available banks
• Read 40B packets from specified banks

Problem

• What if back to back reads occur from a small number of banks?

One 40B packet

every 8ns

Sol 2: Can’t we just use lots of DRAMs as one monolithic memory in parallel?

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Bytes: 0-39

40-79

280-319

320B

320B

Write Rate, R

Buffer Manager

One 40B packet

every 8ns

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

40B

40B

40B

40B

40B

40B

40B

40B

Sol 2: Works fine if there is only one FIFO

Slow Buffer Memory

Bytes: 0-39

40-79

280-319

320B

320B

Write Rate, R

Buffer Manager

40B

40B

320B

320B

One 40B packet

every 8ns

One 40B packet

every 8ns

Sol 2: Works fine if there is only one FIFO

& Supports Variable Length Packets

Buffer Memory

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

Bytes: 0-39

40-79

280-319

320B

320B

Write Rate, R

Buffer Manager

?B

?B

320B

320B

One 40B packet

every 8ns

One 40B packet

every 8ns

320B

320B

Write Rate, R

Buffer Manager

?B

?B

320B

320B

One 40B packet

every 8ns

One 40B packet

every 8ns

Sol 2: In practice, buffer holds many FIFOs

1

320B

320B

320B

320B

How can we writemultiple variable-lengthpackets into different

queues?

Q might be 1k – 64k

2

320B

320B

320B

320B

Q

320B

320B

320B

320B

Bytes: 0-39

40-79

280-319

Problem

A block contains packets for different queues, which must be written to, or read from different memory locations.

Small Probability of Miss Rate

Sol 3: Hybrid Memory Hierarchy

Big slow memory

DRAM

Small fast cache

SRAM

Arriving

Packet processor

Departing

Packets

Packets

R

R

A CPU cache is probabilistic

Q: Why is randomness a problem in this context?

Large DRAM memory holds FIFO body

54

53

52

51

50

10

9

8

7

6

5

1

95

94

93

92

91

90

89

88

87

86

15

14

13

12

11

10

9

8

7

6

2

86

85

84

83

82

11

10

9

8

7

DRAM

Q

b bytes

Writing

b bytes

1

1

4

3

1

2

Arriving

Departing

55

60

59

58

57

56

2

Packets

Packets

2

1

2

4

3

5

97

96

R

R

Q

Q

6

5

4

3

2

1

SRAM

87

88

91

90

89

Unpredictable

Scheduler

Small SRAM

Small SRAM

Requests

for FIFO tails

Sol 4: Hybrid Memory Hierarchy with 100% Cache Hit Rate
Design questions
• What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested?
• What algorithm minimizes the SRAM size?

Bytes

Replenish

Bytes

Bytes

Bytes

t = 0

t = 1

t = 2

t = 3

Replenish

Bytes

Bytes

Bytes

Bytes

t = 4

t = 5

t = 6

t = 7

An Example Q = 5, w = 9+, b = 6

Bytes

Bytes

Bytes

Bytes

t = 8

t = 9

t = 10

t = 11

Replenish

Replenish

Bytes

Bytes

Bytes

Bytes

t = 13

t = 19

t = 23

t = 12

An Example Q = 5, w = 9+, b = 6
The size of the SRAM cache

Bytes

Necessity

• How large does the SRAM cache need to be under any management algorithm?
• Claim: wQ > Q(b - 1)(2 + lnQ)

Sufficiency

• For any pattern of arrivals, what is the smallest SRAM cache needed so that a byte is always available when requested?
• For one particular algorithm: wQ = Qb(2 + lnQ)

Q

w

w

Definitions

Occupancy: X(q,t)

The number of bytes in FIFO q(in SRAM) at time t.

Deficit: D(q,t) = w - X(q,t)

Q

w

w

deficit

occupancy

Smallest SRAM cache

In addition, each queue needs to hold (b – 1) bytes in case it is replenished with b bytes when only 1 byte has been removed.

Therefore, SRAM size must be at least: Qw > Q(b – 1)(2 + lnQ).

Examples:

• 40Gb/s linecard, b=640, Q=128: SRAM = 560kBytes
• 160Gb/s linecard, b=2560, Q=512: SRAM = 10MBytes
Most Deficit Queue First
• Algorithm: Every b timeslots, replenish the queue with the largest deficit.
• Claim: An SRAM cache of size Qw > Qb(2 + lnQ) is sufficient.

Examples:

• 40Gb/s line card, b=640, Q=128: SRAM = 560kBytes
• 160Gb/s line card, b=2560, Q=512: SRAM = 10MBytes
Intuition for Theorem
• The maximum number of un-replenished requests for any i queues wi, is the solution of the difference equation -
• with boundary conditions