Congestion Management for Data Centers: IEEE 802.1 Ethernet Standard. Balaji Prabhakar Departments of EE and CS Stanford University. Background. Data Centers see the true convergence of L3 and L2 transport
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Congestion Management for Data Centers:IEEE 802.1 Ethernet Standard
Balaji Prabhakar
Departments of EE and CS
Stanford University
Background
L2 Transport: IEEE 802.1
Pause absorption buffers
Congestion spreading
X
X
TCP
TCP
TCP
TCP
p
qavg
minth
maxth
RED: Drop probability, p, increases as
the congestion level goes up
TCP: Slow start +
Congestion avoidance
Congestion avoidance: AIMD
No loss: increase window by 1;
Pkt loss: cut window by half
1200
# of TCP flows
grp2
grp1
1200
# of TCP flows
0
50
100
150
200
time
300
0
50
100
150
200
time
grp3
100Mbps
100Mbps
RED
RED
# of TCP flows
600
0
50
100
150
200
time
1/R
C

q

Time
Delay
p
TCP Control
RED Control
1.5
Users:
Network:
W: window size; RTT: round trip time; C: link capacity
q: queue length; qa: ave queue length p: drop probability
*By V. Misra, W. Dong and D. Towsley at SIGCOMM 2000
*Fluid model concept originated by F. Kelly, A. Maullo and D. Tan at Jour. Oper. Res. Society, 1998
Recall the ns2 simulation from earlier: Delay at Link 1
Quantized Congestion Notification (QCN):
Congestion control for Ethernet
Joint work with:
Mohammad Alizadeh, BerkAtikoglu and Abdul Kabbani, Stanford University
AshvinLakshmikantha, Broadcom
Rong Pan, Cisco Systems
Mick Seaman, Chair, Security Group; ExChair, Interworking Group, IEEE 802.1
Qeq
Source
Pmax
Reflection
Probability
Fb = (QQeq+ w . dQ/dt)
= (queue offset + w.rate offset)
Pmin
Fb
TR
Target Rate
CR
Rd/8
Rd/4
Rd
Rate
Rd/2
Current Rate
Time
Congestion message recd
Fast Recovery
Active Probing
ByteCtr
RL
Timer
10 G
10 G
Source 1
Source 2
0.5G
Source 10
Recovery time = 80 msec
P = Φ(Fb)
10%
Fb
63
N = 10, RTT = 100 us
N = 100, RTT = 500 us
N = 10, RTT = 1 ms
N = 10, RTT = 2 ms
TR = Target Rate
CR = Current Rate
TR
CR
Target Rate
Rd/8
Rd/4
Rd
Rate
Rd/2
Current Rate
Time
Congestion message recd
Active Probing
Where
RTT = 60 msec
RTT = 65 msec
RTT = 120 msec
RTT = 130 msec
RTT = 230 msec
RTT = 240 msec
Source
does AP
Fb
Regular
source
0.5 Fb + 0.25 T dFb/dt
RTT = 120 msec
RTT = 130 msec
P(s) = (s+1)/(s3+1.6s2+0.8s+0.6)
Twostep AP is even more stable than
Basic AP
Background: TCP Buffer Sizing
Example: Simulation Setup
Switch
TCP vs QCN (N = 1, RTT = 120 μs)
QCN
TCP
Throughput = 99.5%
Standard Deviation = 265.4 Mbps
Throughput = 99.5%
Standard Deviation = 13.8 Mbps
TCP vs QCN (N = 1, RTT = 250 μs)
QCN
TCP
Throughput = 95.5%
Standard Deviation = 782.7 Mbps
Throughput = 99.5%
Standard Deviation = 33.3 Mbps
TCP vs QCN (N = 1, RTT = 500 μs)
QCN
TCP
Throughput = 88%
Standard Deviation = 1249.7 Mbps
Throughput = 99.5%
Standard Deviation = 95.4 Mbps
TCP vs QCN (N = 10, RTT = 120 μs)
QCN
TCP
Throughput = 99.5%
Standard Deviation = 625.1 Mbps
Throughput = 99.5%
Standard Deviation = 25.1 Mbps
TCP vs QCN (N = 10, RTT = 250 μs)
QCN
TCP
Throughput = 95.5%
Standard Deviation = 981 Mbps
Throughput = 99.5%
Standard Deviation = 27.2 Mbps
TCP vs QCN (N = 10, RTT = 500 μs)
QCN
TCP
Throughput = 89%
Standard Deviation = 1311.4 Mbps
Throughput = 99.5%
Standard Deviation = 170.5 Mbps
QCN and shallow buffers
Buffer size = C x Var(R1) x Bdwdth x Delay/ sqrt(N)