Congestion mitigation
Download
1 / 20

Congestion Mitigation - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Congestion Mitigation. Trying to maximise performance between MFN’s network and a peer’s network over some busy PNIs. Hello, Good Evening. Joe Abley jabley@mfnx.net Toolmaker, Token Canadian Metromedia Fiber Network. Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Congestion Mitigation' - krikor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Congestion mitigation

Congestion Mitigation

Trying to maximise performance between MFN’s network and a peer’s network over some busy PNIs


Hello good evening
Hello, Good Evening

Joe Abley

jabley@mfnx.net

Toolmaker, Token Canadian

Metromedia Fiber Network


Background
Background

  • There are frequent periods of congestion between our network and a peer’s network in Europe

  • The peer is a major operator in the region, and evil forces are preventing us from simply lighting up new fibre

  • We need to work with what we have in place already


Characteristics of problem
Characteristics of Problem

  • The peer is home to a lot of broadband subscribers

  • Transient hot-spots in content hosted within MFN’s network cause localised congestion in some PNIs, with other PNIs showing headroom

  • We get paid for shifting packets (if we don’t carry the packets, we don’t get paid)



Goals
Goals

  • Identify the important consumers of traffic in and beyond the peer’s network

  • Once we can characterise the major traffic sinks, we can try and balance them out across our various PNIs

  • Hopefully this will make the PNIs less sensitive to bursty traffic

  • We expect to have to keep measuring and rebalancing


Tools
Tools

  • Ixia IxTraffic

    • Gumption

      • Unix-based BGP speaker that participates in the IBGP mesh

      • Gives us route history

    • SeeFlow

      • Smart NetFlow collector which talks to Gumption

  • Awk

    • We always have awk


Infrastructure
Infrastructure

  • NetFlow traffic to be collected in-band from GSR12012s

  • Single IxTraffic box:

    • FreeBSD 4.5, i386, dual 700MHz P3, 2GB RAM

    • Overkill

    • Load average occasionally peaks above 0.07

    • 10GB filesystem for storing routing and flow data in

    • Located in Virginia

  • MRTG-like thing (duck) which also lives in VA on a different box gives us nice visibility of congestion trends


Exporting flow data
Exporting Flow Data

  • Flow switching needs to be turned on in a maint window, because it makes the routers belch impolitely

    • All interfaces that can contribute towards traffic sent towards the peer get “ip route-cache flow sampled”

    • See kiddie script! See router die!

  • Export config is trivial:

    ip flow-export source Loopback0

    ip flow-export version 5 peer-as

    ip flow-export destination 192.168.38.9 3871

    ip flow-sampling-mode packet-interval 1000

    • Note low sampling rate of 1:1000.


Collecting flow data
Collecting Flow Data

  • SeeFlow is configured to populate net2net and aspath matrices (“buckets”)

    • We suspect that a lot of data is getting sunk within the peer network, hence net2net

    • We could be wrong, and aspath matrices are cool, so we collect those too

  • Buckets chew up about 50MB of disk per day (all routers)


Initial discoveries
Initial Discoveries

  • All the traffic is being sunk within the peer network, and not in a downstream network

  • Damn

  • All the traffic is being sunk into a single /12 advertisement

  • Damn

  • We need better granularity if we are going to be able to spread the demand across our PNIs


Aspath matrices
ASPATH Matrices

[jabley@nautilus]$ seeasp -s 3320 ./dtag.agg | more

Facets:

TimeInterval : 05/09/2002 15:00:03.663492 - 06/05/2002 04:57:58.747866 PDT

SuperDataFacet :

Facets:

RouterIpv4Addr : 209.249.254.142

RouterName : pr1.fra1.de.mfnx.net

Facets:

RouterIpv4Addr : 209.249.254.195

RouterName : mpr2.vie3.at.mfnx.net

Facets:

RouterIpv4Addr : 216.200.254.246

RouterName : mpr1.ams1.nl.mfnx.net

AS P PktsThru BytesThru PktsTo BytesTo PktsTotal BytesTotal

----- - ---------- ------------ ---------- ------------ ---------- ------------

0 - 0 0 371.203M 143.286G 371.203M 143.286G

AAAA P 1.816M 333.567M 108.112M 36.705G 109.928M 37.038G

BBBB - 0 0 1.516M 191.657M 1.516M 191.657M

CCCC - 0 0 33.173K 23.932M 33.173K 23.932M

DDDD - 9 5.118K 35.555K 23.064M 35.564K 23.069M

EEEE - 12.567K 7.998M 3.663K 2.413M 16.230K 10.411M

FFFF - 917 704.260K 16.872K 9.642M 17.789K 10.346M

GGGG - 2.187K 1.406M 22.323K 8.250M 24.510K 9.656M

HHHH - 0 0 30.447K 8.587M 30.447K 8.587M

IIII - 0 0 10.658K 7.427M 10.658K 7.427M

JJJJ - 0 0 27.932K 7.029M 27.932K 7.029M


Net2net matrices
Net2Net Matrices

172.184.0.0/13 -> A.A.0.0/12 13001989 5601247858

172.184.0.0/13 -> A.A.0.0/12 12983375 5592070913

62.4.67.0/24 -> B.B.0.0/11 9459634 1687041555

62.4.67.0/24 -> B.B.0.0/11 9443861 1677536483

172.176.0.0/14 -> A.A.0.0/12 7113026 2985029679

172.176.0.0/14 -> A.A.0.0/12 7099648 2977787074

62.80.115.0/24 -> C.C.0.0/11 6873518 1236318991

62.4.67.0/24 -> A.A.0.0/12 6689319 1180741686

62.4.82.0/24 -> A.A.0.0/12 6611879 1171430532

62.4.67.0/24 -> C.C.0.0/11 3469776 629221553

62.4.82.0/24 -> C.C.0.0/11 3433970 625422145

62.4.67.0/24 -> D.0.0.0/13 2422913 442942807

62.4.67.0/24 -> D.0.0.0/13 2407651 470778890

62.4.65.96/27 -> A.A.0.0/12 1981446 287218317

62.80.116.0/24 -> E.E.0.0/15 1802114 378062358

62.4.67.0/24 -> F.F.0.0/14 1510412 315282857

62.4.67.0/24 -> F.F.0.0/14 1421497 277014582

62.4.65.96/27 -> B.B.0.0/11 1341063 378931389

62.4.81.128/27 -> B.B.0.0/11 1330058 378268227

172.185.0.0/16 -> A.A.0.0/12 1077841 446966211

172.185.0.0/16 -> A.A.0.0/12 1073445 443555367


Destination prefix histogram
Destination Prefix Histogram

destination net megabytes proportion

A.A.0.0/12 21478 59%

B.B.0.0/11 6494 18%

C.C.0.0/11 4388 12%

D.0.0.0/13 1365 3%

F.F.0.0/14 1033 2%

G.G.0.0/15 416 1%

H.H.0.0/14 311 0%

I.I.I.0/21 160 0%

J.J.0.0/15 117 0%

K.K.0.0/19 89 0%


Drilling down into a a 0 0 12
Drilling down into A.A.0.0/12

  • Ask peer to advertise longer prefixes within the /12, so we can measure the traffic per prefix

  • Wait for response

  • GOTO 10

  • Maybe we can fix this ourselves


Home dlr bin bgpd
/home/dlr/bin/bgpd

  • We injected 15 covered /16 prefixes into IBGP, with a NEXT_HOP that lay within the remaining /16

  • All tagged no-export, to avoid messing with the peer’s public route policy

  • Strictly local-use within AS6461


More collection
More Collection

  • The increased granularity gives us better visibility into the traffic sinks within the peer network

  • We will try to spread the traffic over the available PNIs so we can weather bursts of demand more effectively

  • We will also continue to let the peer know what we are doing

    • You never know, they may be listening


New dest prefix histogram
New Dest Prefix Histogram

destination net megabytes proportion

B.B.0.0/11 1912 14%

A.76.0.0/16 1530 11%

C.C.0.0/11 1516 11%

A.73.0.0/16 1120 8%

A.72.0.0/16 1024 7%

A.64.0.0/12 874 6%

A.74.0.0/16 683 5%

A.70.0.0/16 696 5%

A.68.0.0/16 601 4%

A.66.0.0/16 437 3%



Conclusions
Conclusions

  • “Light more fibre” is not always a realistic strategy

  • You are not always your peer’s number one priority, so it’s nice to be able to take matters into your own hands

  • Distributing the heavy traffic sinks across different PNIs makes bursty demand less unpleasant

  • Routing plus flow data = Power. Or something.