1 / 48

Scaling beyond 10G: When what you have is never enough…

Scaling beyond 10G: When what you have is never enough…. Mike Hughes <mike@linx.net> CTO London Internet Exchange. Brief History of LINX. Founded in 1994 by 5 ISPs Pipex (the original “Pipex”, now MCI/Uunet) Demon Internet BTnet UKERNA EUnet GB (later PSInet, now Telstra UK)

abram
Download Presentation

Scaling beyond 10G: When what you have is never enough…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling beyond 10G:When what you have is never enough… Mike Hughes <mike@linx.net> CTO London Internet Exchange

  2. Brief History of LINX • Founded in 1994 by 5 ISPs • Pipex (the original “Pipex”, now MCI/Uunet) • Demon Internet • BTnet • UKERNA • EUnet GB (later PSInet, now Telstra UK) • A switch (well 10Mb hub!) in Telehouse • Volunteer staff

  3. Architecture Development - 1996 • A FDDI ring based architecture • Cisco and Plaintree switches • FDDI, 100Mb TX and 10Mb connections • Full time staff

  4. Architecture Development - 1998 • Gigabit Ethernet switches • First Metro GigE deployment in EU • Multiple site IX • Multiple vendor • Packet Engines • Extreme • Broke the 1G mark in Nov 1999

  5. Cathartic Events in 2000 • There was an attempt to take LINX commercial in the wake of the boom • Orchestrated by a number of LINX directors with external backing/funding • Member reaction – “LINX is not for sale!” • Concerns about LINX becoming open to capture • Reaffirmed the mutual, not-for-profit model

  6. LINX Today • 211 members from around 30 different countries • Still strong UK contingent (about 50%) • Most continents represented • 7 co-locations in London Docklands • Dual LAN, Dual Vendor nx10G network • Foundry and Extreme platforms • Not interconnected • Both platforms/networks in each location

  7. Meeting the 10G Challenge • LINX was a very early adopter of 10G • Foundry network first in late 2001 • It just worked! • Removed the need to buy WDM equipment • Costly at the time • That’s been upgraded to nx10G in the backbone as traffic has grown • But now networks attaching to LINX at 10G • Presenting challenges for the backbone

  8. 10G Switches

  9. Upgrade Process • We started upgrading our Foundry platform in 2004 • BigIron MG8 switches • Not a trouble free experience • Now have 13 members connected via 10GE • Now upgrading the Extreme platform to an equivalent spec • And then upgrade Foundry again!

  10. We love pain! • Two networks give us lots of extra redundancy and flexibility • Does mean we get to do things twice, though! • This year, LINX will upgrade the Extreme platform to be of an equivalent spec • Both networks need to be roughly equal • Test as much as possible, then test it again! • Can you be too thorough? • Agreed acceptance criteria with vendor • Especially for the first system

  11. Interesting packet size datapoint

  12. Vendor Selection: What Matters? • 10G port density • 1G port density • Uniform, predictable packet performance • Especially at smaller frame sizes! • Important features • Particularly trunking/LACP • High Availability • Hitless failover/upgrade, redundancy model

  13. Challenges to come • Scaling the network for multiple 10G connections from members • Little sign of active development in 40G/100G arena • Meaning nx10G is best we can expect for now • Being able to provide uniform service in multiple locations • Potential for massive traffic growth…

  14. Scary Doom Curve

  15. Scarier – 3 months later

  16. Where’s it all coming from? • Increased access speeds • ADSL2, WiMAX, FTTx, buzzword, buzzword… • More applications • VoIP is a traffic red-herring – just watch the pps though! • Industry consolidation • Fewer people needing more & faster pipes

  17. Technologies • The sky isn’t the ethernet limit • nx10G seems to be for the time being • 40G or 100G are some way off (3 years) • According to most vendors • CWDM prices are falling • Dark fibre is still relatively cheap • May also be new technologies or ideas on the horizon

  18. X Blocked Link Foundry Network Today: 20G across all nodes What we can do today

  19. X X X= Blocked Links Foundry Network Evolution 1: 2x20G rings …and tomorrow

  20. X X X= Blocked Links Foundry Network Evolution 2: 1x40G, 1x20G …and next week

  21. X X X= Blocked Links Foundry Network Evolution 2: 1x40G, 1x20G …and next month Install Bigger Switches!

  22. Bigger Box: Foundry RX16 • Double the density of the MG8 • Up to 64 line-rate 10G ports per chassis • Biggest on the market today • Keeps traffic inside a single large box • We’ve just finished lab testing

  23. Shorter Term • Bigger switches and fatter Interswitch trunks can meet most needs • 10G connections have to be “concentrated” • But about 50% of a switch could easily be consumed by backbone connectivity • With a consequent push to hierarchical model? • Need some protocol enhancements from vendors • e.g. EAPSv2 and MRP phase 2 • add multiple ring support

  24. Key Features at LINX • Moving to a “dual ring” topology • MRP Phase 2 on Foundry • EAPSv2 on Extreme • Allows different ring sizing • 40G ring on larger sites • Increases effective ISL bandwidth • less “transit” flows • Low(ish) dark fibre cost – no WDM here

  25. Foundry Network Plans

  26. Extreme Network Plans

  27. Fibre Network Expansion (1)

  28. Fibre Network Expansion (2)

  29. So, what’s next? • At the last Seattle NANOG, a Force10 person came and asked: • “What do you want, 40G or 100G?” • The answer seemed to be 100G • We can do 40G now: • Expensively @ OC768 • Cheaply @ 4x10GE • Therefore 40GE is a chocolate kettle • It’s a waste of devel time (and cash) • Who’s watching the core?

  30. Hey, but can’t we just… • Build fat 8x10G link-agg? • Rate limit/transfer cap users? • Implement QoS? • Throttle p2p apps? • …well, yes, you could. • But it either doesn’t scale, isn’t an option, or is costly and complex.

  31. It’s easier to overprovide… “For a number of years, we seriously explored various “quality of service” schemes, including having our engineers convene a Quality of Service Working Group. Our research came to the conclusion that it was far more cost effective to simply provide more bandwidth. With enough bandwidth in the network, there is no congestion and video bits do not need preferential treatment.” - Gary Bachula, VP Internet2

  32. …with the right technology • We already need something faster than 10GE (and 40GE?). • Some networks already building 8x10GE link agg bundles on a single span! • Common engineering sense says that your backbone has to be some multiple larger than your largest customer connection. • A LINX member asked about ordering a 2x10G port last week!

  33. Looking Forward • Ethernet rings can have some problems • All nodes have to be (roughly) equal • Multiple rings solves most of this • Still constrained by max link speed/trunk size • Is the Swedish model - unconnected switches – a better way? • Backplane bandwidth is unrestricted/cheap • Some redundancy/resiliency challenges

  34. How the Swedes do it • Enabled by the fibre situation in Stockholm • City run fibre utility/monopoly • Therefore fibre is readily available • Two disconnected switches in different locations • You get two pairs of fibre when you connect • One to each switch, in secure underground “cave” • Everything contained in the backplane

  35. Traffic Management • MPLS • The DIX-IE (Tokyo) is involved in a trial of an MPLS interconnect – using conventional routing (ISIS) to route the network and LDP to discover endpoints – “mplsASSOCIO” • Downside is potentially complex config • TRILL (nothing to do with Star Trek) • IETF working group to support “L2 routing” • “rbridge”: ISIS for Layer 2, using MAC addresses • Would solve “wasted” redundant bandwidth

  36. What’s going where? • The challenge with a flat L2 network • Just big broadcast domain(s) • Is it easier to take bulk flows and give a dedicated channel? • How to identify these flows? • ISP can do it (Netflow) • The IXP/MAN can do it (Sflow)

  37. Sflow @ 10G • It’s sampled but still a hell of a lot of data • Sample rate @ 1 in 2048 packets • Gives about 60GB per day • Need 850G disk to deal with 2 weeks data • If traffic doubles in the year, need 1.7TB • Actually become constrained by disk I/O • But we’re still deploying it anyway…

  38. Other Scalers • Passive Private Interconnect • Fibre cross-connects to shed the largest flows • Cheap (for the IX), easy to implement • Can run whatever protocol the peers choose • More exchanges • Could LINX run a third platform? • More smaller exchanges? What about critical mass? • “Transmission Only” • e.g. WDM platforms, stub-sites (no switch)

  39. Move to “Stub” Nodes • Reduce core nodes down to small number of switches • Minimise interswitch connectivity • Stub nodes: • Cheap switch for 100M aggregation • CWDM terminal for GigE/10G transport • All traffic then hauled to centre • Pseudo-Swedish with “edges”

  40. DWDM Terminal DWDM Terminal “Stub” overview AGG Switch 100M conns GigE conns

  41. Pros/Cons of Stubs • Pros • Easy to set up • Low commitment required • Relatively cheap per stub • May help break into new and “remote” locations • Cons • Less redundancy/resiliency • Finite (size of mux/aggr switch) • Hauls all traffic to core (even local 1G tfc) • Doesn’t fit ring topology of many fibre builds

  42. Hierarchical Model • Core, Aggregation, Edge layers? • An expansion of “stubs”, really • More interswitch connectivity needed • Due to meshed topology • Simple ring topology no longer possible • May work for “core”, with edge “mesh” • Probably more expensive • More devices, increased management

  43. Wrapping Up • Some vendors are saying that the next Ethernet standard is 5 years out. Too late! • While edge speed has increased, the core has stood still • Don’t edge and core vendors talk to each other? • Massive parallel links and “carving off” traffic is a tool in dealing with this • But adds complexity • Seems that keeping things simple remains key

  44. Where are we now?

  45. Where are we now?

  46. Where are we now?

  47. Where are we now?

  48. Questions?

More Related