1 / 38

Florin Dinu T. S. Eugene Ng Rice University

Inferring a Network Congestion Map with Traffic Overhead. 0. zero. Florin Dinu T. S. Eugene Ng Rice University. Effects of Congestion. Need to identify , quantify and localize congestion. The Vision: Passively Inferred Congestion Map. AS 2. AS 1. . . . X 8.

Download Presentation

Florin Dinu T. S. Eugene Ng Rice University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Inferring a Network Congestion Map with Traffic Overhead 0 zero Florin Dinu T. S. Eugene Ng Rice University

  2. Effects of Congestion Need to identify, quantify and localize congestion

  3. The Vision: Passively Inferred Congestion Map AS2 AS1 . . . X8 R0 R1 R3 R5 . . . X7 R2 R4 R6 • Without any dedicated measurement (probing) traffic • At fine time granularities (seconds) • Good accuracy How it works? Why it works? Where is this applicable?

  4. Benefits of Passive Inference x x Passive inference – complementary to active reporting

  5. Overview – Passively Inferring Congestion Maps AS2 . . . AS1 R0 R0 R1 R1 R3 R5 X8 . . . X7 R2 R4 R6 • Step 1 : • Use congestion markings from existing traffic • Get path-level congestion information • Routers are AQM/ECN capable and can mark existing traffic

  6. Overview – Passively Inferring Congestion Maps R0 R0 R1 R1 P06 R3 R5 P04 ? P46 P06 – P04 P46 = func(P06,P04) = 1 – P04 R2 R4 R6 • Step 2: • Use topological information to complete congestion map Expand on Step 1: path-level congestion from AQM/ECN markings

  7. AQM Background • AQM = Active Queue Management • Router marks/drops packets probabilistically as a function of congestion severity • Many different definitions of congestion severity REM RED, PI Marking Probability (MP) Congestion severity We use marking probability (MP) as the congestion measure

  8. ECN Background – Marking Data Packets ECN = Explicit Congestion Notification S D AQM/ECN Data packets are marked probabilistically

  9. Use of the Data Markings R0 R0 R1 R1 P40 R3 R5 P30 P60 R2 R4 R6 • Data markings describe congestion on routers’ ingress paths • Data packet marking is probabilistic => • Use ratio of marked data packets to obtain MP on the ingress path

  10. ECN Background - Echoing Echoing the markings from data packets to ACKs: S D DATA ACK The ACK markings are an altered version of the data packet markings

  11. ECN Background – Responding to Markings Responding to marked ACKs: CWR DATA S S S D D ACK Stopping the echoing after receiving a CWR packet: CWR DATA ACK The ACK markings are an altered version of the data packet markings

  12. Groups - Effect of ECN Echoing Groups of marked and unmarked ACKs: CWR DATA D D ACK Groups of unmarked ACKs of “size zero”: CWR DATA ACK Group of size zero

  13. Use of the ACK Markings P03 R0 R0 R1 R1 R3 R5 P04 P05 R2 R4 R6 • ACK markings describe congestion on forward paths of the flows • ACK markings describe congestion on routers’ egress paths • Ratio of marked ACKs is an inaccurate measure ACK markings are very important and more challenging to use

  14. Obtaining MP from ACK Markings p = MP on the forward path CWR DATA D ACK = ∑ n ∙ (1-p)n ∙ p=(1-p)/p n=0 ∞ AVG_SZ_UNMARKED= func(p) To get MP need to compute average size of groups of unmarked ACKs

  15. Average Size of Groups of Unmarked ACKs Sampling Interval (SI) end of EI start of Estimation Interval (EI) Flow1 Flow2 Flow3 Flow4 Flow5 Training period Not selected • Select flows until a limit is reached • During training period only select flows, do not compute samples • For each following SI • Sample = avg size of groups of unmarked ACKs that finish in that SI • Discard groups that start or end in different EI • At end of EI use AVG(SAMPLES)=(1-p)/p to obtain p

  16. Optimization – the Use of Groups of Size Zero CWR DATA ACK D Group of size zero • Probability of a group to be of size zero is: (1-p)0 ∙ p = p • If pis high, most groups will be of size zero • Better statistical significance if use groups of size zero • Routers need to be on both the data and ACK path of a flow Use of groups of size zero increases accuracy

  17. Evaluation – Parameter Settings • ns-2 simulations, 500s simulation time • AQM algorithms (RED, PI, REM) – RED by default • SI=0.5 (congestion sample computed every 0.5s) • Monitor at most 1000 flows per EI/path • Groups of size zero used in all experiments

  18. Evaluation – Traffic & Topology • 5ms link delay, 500Mbps link bandwidth Hop 10 R0 to Ri : 250*i2 TCP flows R1 R2 R9 R10 R0 R8 UDP UDP UDP UDP Rito Ri+2: 100 TCP flows Rito Ri+2: 100 TCP flows • Metric: 50th, 90th percentile of |inferred MP – real MP | for each link

  19. Evaluation – vs Baseline Solution Our group-based solution (GROUP): CWR DATA D D ACK Baseline solution, no alteration (REFERENCE): CWR DATA ACK GROUP vsREFERENCE

  20. Sensitivity to the Length of the EI Value of EI (s) - logscale Accuracy decreases with hop count but is within 0.1 for most cases

  21. Sensitivity to Drastic Changes • UDP sources vary their sending rate by 50Mbps between 250Mbps and 750Mbps • Every 10s we start 3000 TCP flows between random nodes, for a random time (0-10s) How well does our solution track these sudden and large variations?

  22. Sensitivity to Drastic Changes 90thperc. EI = 3s 50thperc. EI = 10s Accuracy decreases with hop count but is within 0.1- 0.15 for most cases

  23. Sensitivity to AQM Marking Function REM RED, PI Marking/Drop Probability • Why does REM perform much worse? • Abrupt variations in marking probability • Limited visibility Congestion severity A linear marking function allows better inference for our solution

  24. Limited Visibility P12=?? P20 P10 R0 R1 R2 R1 marks 100% of packets R2 marks 30% of packets • If P20=P10=100%, P12 is unknown (any value possible) • At high MP (less than 100%) problem still exist because very few packets are left unmarked Limited visibility appears at high MP. More probable for REM.

  25. Sensitivity to Dropped ACKs - Numerical • ACKs can be dropped by non-AQM/ECN routers • Pure ACKs can be dropped even by AQM/ECN routers Size 4 5 1 5 Average size: 3.75 Size 8 1 4 Average size: 4.33 Drop ACKs can modify the average size of groups of unmarked ACKs

  26. Sensitivity to Dropped ACKs - Numerical At reasonable drop probabilities the additional error is low

  27. Other Advantages of Our Solution • Incremental deployment • On specific paths • Around non AQM-ECN routers • Useful in heterogeneous environments • Different AQM types

  28. Related Work • Re-ECN [SIGCOMM 2005] , ConEx IETF WG • Extends ECN with one step • Sources re-echo congestion information from ACK markings • A router on forward path has upstream, downstream and whole path-congestion • Useful for traffic policing or traffic management • Lower precision. Limited by header space bits. • Needs modifications to ECN and headers • Does not address challenge posed by ACK markings • Does not go beyond path-level congestion inference

  29. Conclusion • Novel method for inferring congestion with zero network overhead • Does not require changes to hosts, headers or protocols • Incrementally deployable and useful in heterogeneous environments • Good accuracy even in very congested environments

  30. Thank you Credits for the pictures • http://networkequipment.net/wp-content/uploads/2011/02/voip-telephone.jpg • http://www.freefoto.com/images/04/28/04_28_50---US-Dollar-Bills_web.jpg • http://www.ciscorouting.com/routing_engine.jpg • http://www.rvoice.co.uk/uploads/Image/Green%20Tick.jpg

  31. Why not Use Ratio for ACK Markings? The ratio of marked ACKs is very inaccurate. Need a better solution.

  32. Effects of Using Delayed ACK - Numerical Additional error introduced by the use of delayed ACK

  33. Sensitivity to Bandwidth (EI = 3s) Accuracy increases with bandwidth

  34. Sensitivity to Flow Size (EI = 3s) Good accuracy even with many small flows

  35. Severity of False Positives (EI = 3s) Small false positives inherent in probabilistic approach

  36. Granularity of Inference R0 R0 R1 R1 P40 R3 R5 P06 R2 R4 R6 Sampling Interval (SI) Estimation Interval (EI) estimate(P06) = AVG( {samples(P06)} )

  37. Implementation • Counters per-path • Length & Number of all groups of unmarked Acks • Counters per-flow • Current group of unmarked ACKs • Prefix matching for source and destination • Transport protocol header matching for flow identification • Sequence numbers for CWR

  38. Coverage of Congestion Maps • Six real network topologies (Internet2, TEIN2, iLight, GEANT, SUNET, NLR) • Assume all-to-all traffic pattern • Average congestion map coverage NLR, Internet2, GEANT ~60% TEIN2 ~ 91% iLight ~ 94% SUNET ~ 95%

More Related