Florin Dinu T. S. Eugene Ng Rice University

Inferring a Network Congestion Map with Traffic Overhead 0 zero Florin Dinu T. S. Eugene Ng Rice University

Effects of Congestion Need to identify, quantify and localize congestion

The Vision: Passively Inferred Congestion Map AS2 AS1 . . . X8 R0 R1 R3 R5 . . . X7 R2 R4 R6 • Without any dedicated measurement (probing) traffic • At fine time granularities (seconds) • Good accuracy How it works? Why it works? Where is this applicable?

Benefits of Passive Inference x x Passive inference – complementary to active reporting

Overview – Passively Inferring Congestion Maps AS2 . . . AS1 R0 R0 R1 R1 R3 R5 X8 . . . X7 R2 R4 R6 • Step 1 : • Use congestion markings from existing traffic • Get path-level congestion information • Routers are AQM/ECN capable and can mark existing traffic

Overview – Passively Inferring Congestion Maps R0 R0 R1 R1 P06 R3 R5 P04 ? P46 P06 – P04 P46 = func(P06,P04) = 1 – P04 R2 R4 R6 • Step 2: • Use topological information to complete congestion map Expand on Step 1: path-level congestion from AQM/ECN markings

AQM Background • AQM = Active Queue Management • Router marks/drops packets probabilistically as a function of congestion severity • Many different definitions of congestion severity REM RED, PI Marking Probability (MP) Congestion severity We use marking probability (MP) as the congestion measure

ECN Background – Marking Data Packets ECN = Explicit Congestion Notification S D AQM/ECN Data packets are marked probabilistically

Use of the Data Markings R0 R0 R1 R1 P40 R3 R5 P30 P60 R2 R4 R6 • Data markings describe congestion on routers’ ingress paths • Data packet marking is probabilistic => • Use ratio of marked data packets to obtain MP on the ingress path

ECN Background - Echoing Echoing the markings from data packets to ACKs: S D DATA ACK The ACK markings are an altered version of the data packet markings

ECN Background – Responding to Markings Responding to marked ACKs: CWR DATA S S S D D ACK Stopping the echoing after receiving a CWR packet: CWR DATA ACK The ACK markings are an altered version of the data packet markings

Groups - Effect of ECN Echoing Groups of marked and unmarked ACKs: CWR DATA D D ACK Groups of unmarked ACKs of “size zero”: CWR DATA ACK Group of size zero

Use of the ACK Markings P03 R0 R0 R1 R1 R3 R5 P04 P05 R2 R4 R6 • ACK markings describe congestion on forward paths of the flows • ACK markings describe congestion on routers’ egress paths • Ratio of marked ACKs is an inaccurate measure ACK markings are very important and more challenging to use

Obtaining MP from ACK Markings p = MP on the forward path CWR DATA D ACK = ∑ n ∙ (1-p)n ∙ p=(1-p)/p n=0 ∞ AVG_SZ_UNMARKED= func(p) To get MP need to compute average size of groups of unmarked ACKs

Average Size of Groups of Unmarked ACKs Sampling Interval (SI) end of EI start of Estimation Interval (EI) Flow1 Flow2 Flow3 Flow4 Flow5 Training period Not selected • Select flows until a limit is reached • During training period only select flows, do not compute samples • For each following SI • Sample = avg size of groups of unmarked ACKs that finish in that SI • Discard groups that start or end in different EI • At end of EI use AVG(SAMPLES)=(1-p)/p to obtain p

Optimization – the Use of Groups of Size Zero CWR DATA ACK D Group of size zero • Probability of a group to be of size zero is: (1-p)0 ∙ p = p • If pis high, most groups will be of size zero • Better statistical significance if use groups of size zero • Routers need to be on both the data and ACK path of a flow Use of groups of size zero increases accuracy

Evaluation – Parameter Settings • ns-2 simulations, 500s simulation time • AQM algorithms (RED, PI, REM) – RED by default • SI=0.5 (congestion sample computed every 0.5s) • Monitor at most 1000 flows per EI/path • Groups of size zero used in all experiments

Evaluation – Traffic & Topology • 5ms link delay, 500Mbps link bandwidth Hop 10 R0 to Ri : 250*i2 TCP flows R1 R2 R9 R10 R0 R8 UDP UDP UDP UDP Rito Ri+2: 100 TCP flows Rito Ri+2: 100 TCP flows • Metric: 50th, 90th percentile of |inferred MP – real MP | for each link

Evaluation – vs Baseline Solution Our group-based solution (GROUP): CWR DATA D D ACK Baseline solution, no alteration (REFERENCE): CWR DATA ACK GROUP vsREFERENCE

Sensitivity to the Length of the EI Value of EI (s) - logscale Accuracy decreases with hop count but is within 0.1 for most cases

Sensitivity to Drastic Changes • UDP sources vary their sending rate by 50Mbps between 250Mbps and 750Mbps • Every 10s we start 3000 TCP flows between random nodes, for a random time (0-10s) How well does our solution track these sudden and large variations?

Sensitivity to Drastic Changes 90thperc. EI = 3s 50thperc. EI = 10s Accuracy decreases with hop count but is within 0.1- 0.15 for most cases

Sensitivity to AQM Marking Function REM RED, PI Marking/Drop Probability • Why does REM perform much worse? • Abrupt variations in marking probability • Limited visibility Congestion severity A linear marking function allows better inference for our solution

Limited Visibility P12=?? P20 P10 R0 R1 R2 R1 marks 100% of packets R2 marks 30% of packets • If P20=P10=100%, P12 is unknown (any value possible) • At high MP (less than 100%) problem still exist because very few packets are left unmarked Limited visibility appears at high MP. More probable for REM.

Sensitivity to Dropped ACKs - Numerical • ACKs can be dropped by non-AQM/ECN routers • Pure ACKs can be dropped even by AQM/ECN routers Size 4 5 1 5 Average size: 3.75 Size 8 1 4 Average size: 4.33 Drop ACKs can modify the average size of groups of unmarked ACKs

Sensitivity to Dropped ACKs - Numerical At reasonable drop probabilities the additional error is low

Other Advantages of Our Solution • Incremental deployment • On specific paths • Around non AQM-ECN routers • Useful in heterogeneous environments • Different AQM types

Related Work • Re-ECN [SIGCOMM 2005] , ConEx IETF WG • Extends ECN with one step • Sources re-echo congestion information from ACK markings • A router on forward path has upstream, downstream and whole path-congestion • Useful for traffic policing or traffic management • Lower precision. Limited by header space bits. • Needs modifications to ECN and headers • Does not address challenge posed by ACK markings • Does not go beyond path-level congestion inference

Conclusion • Novel method for inferring congestion with zero network overhead • Does not require changes to hosts, headers or protocols • Incrementally deployable and useful in heterogeneous environments • Good accuracy even in very congested environments

Thank you Credits for the pictures • http://networkequipment.net/wp-content/uploads/2011/02/voip-telephone.jpg • http://www.freefoto.com/images/04/28/04_28_50---US-Dollar-Bills_web.jpg • http://www.ciscorouting.com/routing_engine.jpg • http://www.rvoice.co.uk/uploads/Image/Green%20Tick.jpg

Why not Use Ratio for ACK Markings? The ratio of marked ACKs is very inaccurate. Need a better solution.

Effects of Using Delayed ACK - Numerical Additional error introduced by the use of delayed ACK

Sensitivity to Bandwidth (EI = 3s) Accuracy increases with bandwidth

Sensitivity to Flow Size (EI = 3s) Good accuracy even with many small flows

Severity of False Positives (EI = 3s) Small false positives inherent in probabilistic approach

Granularity of Inference R0 R0 R1 R1 P40 R3 R5 P06 R2 R4 R6 Sampling Interval (SI) Estimation Interval (EI) estimate(P06) = AVG( {samples(P06)} )

Implementation • Counters per-path • Length & Number of all groups of unmarked Acks • Counters per-flow • Current group of unmarked ACKs • Prefix matching for source and destination • Transport protocol header matching for flow identification • Sequence numbers for CWR

Coverage of Congestion Maps • Six real network topologies (Internet2, TEIN2, iLight, GEANT, SUNET, NLR) • Assume all-to-all traffic pattern • Average congestion map coverage NLR, Internet2, GEANT ~60% TEIN2 ~ 91% iLight ~ 94% SUNET ~ 95%

Florin Dinu T. S. Eugene Ng Rice University

Florin Dinu T. S. Eugene Ng Rice University

Presentation Transcript

K thu t s d ng s ng ti u li n ak V S NG TR NG CKC

Florin Dinu T. S. Eugene Ng Rice University

FLORIN GJERGJAJ

Rice University EMS

Rice University

T. K. Ng, HKUST

HPCToolkit / Rice University

Eugene Demler Harvard University

Eugene Demler Harvard University

T. K. Ng, HKUST

Xiaoye Sun , Aproov Agarwal , T.-S. Eugene Ng Rice University

T 0 rice transformants

DINU PATRICIU

Eugene Demler Harvard University