Ethernet Data Center Routing Challenges
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH [email protected] PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH [email protected] A) Tweak Bridge Priorities Here. B). S 1 … S 16. 802.1aq’s 16 ECT can give perfect spread going 2 hops 16 uplinks. However:

Download Presentation

Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH [email protected]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ethernet data center routing challenges and 802 1aq

Ethernet Data Center Routing Challengesand 802.1aq/SPB new work

PETER ASHWOOD-SMITH

[email protected]


Ethernet data center routing challenges and 802 1aq

A) TweakBridgePrioritiesHere

B)

S1 … S16

802.1aq’s 16 ECT can give perfect spread going 2 hops 16 uplinks. However:

A) Need to tweak 2nd layer switch priorities to guarantee all 16 are used.

B) Need at least 16 subnets (C/S-Vlan’s) to assign one per 802.1aq B-VID.


Can we eliminate tweaking

Can we eliminate ‘tweaking*’

  • David Allan et al. have a presentation on this so I won’t spend much time on it.

  • In general a network with N equal cost paths from ‘some source’ to ‘some destination’ requires #ECT about 25-40% greater than N (to statistically capture them all).

  • Therefore when #ECT == N some ‘tweaking’ is usually required (for DC its trivial to do however).

  • Dave et al. suggest non-independence between ECT algorithms as way to address this (maximize diversity) …

*Tweaking = adjustingBridge Priorities up/down fromdefaults.


Ethernet data center routing challenges and 802 1aq

A1

A2

B1

B2

B3

B4

S1,1

S32,1

S3,1

S1,160

S32,160

S3,160

“Example” 802.1aq switching cluster – assume 100GE NNI links/groups

A15

A16

Goodnumbers“16”

& “2”levels.

32 x 100GE

16 x 32 x 100GE = 51.2T

using 48 x 2T switches

16 x 100GE

160 x 10GE

B29

B30

B31

B32

5120 x 10GE

  • 48 switch non blocking 2 layer L2 fabric

  • 16 at “upper” layer A1..A16

  • 32 at “lower” layer B1.. B32

  • 16 uplinks per Bn, & 160 UNI links per Bn

  • 32 downlinks per An

  • (16 x 100GE per Bn)x32 = 512x100GE = 51.2T

  • 160 x 10GE server links (UNI) per Bn

  • (32 x 160)/2 = 2560 servers @ 2x10GE per

  • uFIB = 16 x 48 B-mac = 768 entries

  • mFIB = 16 subnet x 48 src = 768 entries

1536 FIB/node


Ethernet data center routing challenges and 802 1aq

ECT-ALG#12SourceNode (1)

S1 … S16

For a given ECT-ALGk, Aj is a member of every SPF-TREE(B*,ECT-ALGk)

Properly tuned no two ECT-ALGorithms will use the same Aj as a fork point.


Ethernet data center routing challenges and 802 1aq

Subnet Ni maps to I-SIDj and then to a unique A (j mod 16 )

A1

A2

A15

A16

B1

B2

B3

B4

B29

B30

B31

B32

I-SIDi

I-SIDi

I-SIDi

I-SIDj

I-SIDj

I-SIDj

So load spreading allows each Aito transit a complete subnet.

Problem#1 - Unable to further spread such that Aiand Aj(i != j) each handle subset of flows in I-SID j


Ethernet data center routing challenges and 802 1aq

This is an issue under failure of Aj

A1

A2

A15

A16

B1

B2

B3

B4

B29

B30

B31

B32

I-SIDi

I-SIDi

I-SIDi

I-SIDj

I-SIDj

I-SIDj

Recovery will move entire subnet traffic to another Ai node.

A preferable solution is to spread affected load over remaining A*


Ethernet data center routing challenges and 802 1aq

Possible solution – head end hashing (unicast only)

A1

A2

A15

A16

B1

B2

B3

B4

B29

B30

B31

B32

I-SIDi

I-SIDi

I-SIDi

I-SIDj

I-SIDj

I-SIDj

Allow unicast I-SIDi and I-SIDjtraffic to be hashed based on smaller flows to different B-VIDs (ECT-ALGorithms)

This breaks the symmetry and congruence rules but allows edge balancing at smaller granularity. No changes to multicast.Requires learning <C-DA, B-DA> , independent of B-VID

Unicast

Mcast


Ethernet data center routing challenges and 802 1aq

A1

A15

A2

A16

B1

B29

B2

B30

B31

B3

B4

B32

Interconnection of fabrics creates more than 16 paths (exponential )

O(16x2x16)

C1

C2

O(16x2)

A1

A2

A15

A16

O(16)

B29

B30

B31

B32

B1

B2

B3

B4

Number of paths can grow exponentially with increasing levels.

Constant number of paths always << number of paths in many networks.

Growing 802.1aq ECT to say 32 or even 100 ECMP causes larger unicast FIBs.


Ethernet data center routing challenges and 802 1aq

Horizontal Growth – not too bad but need more ECT-ALGORITHMS.

A1

A2

A15

A16

A17

B33

B34

B29

B30

B31

B32

B1

B2

B3

B4

Horizontal growth by 1 just increases number of ECT by 1

Not too big a problem but we would need to define new ECT (via Opaque).


General issue

Choosepath from

N x B-VID

General Issue

O(degree)

D

S

O(diameter)

#paths ~= O( diameter degree)

So head end ECT in worst case requires O(exp(# B-VIDs))


A feasible solution

A feasible solution …

Single B-VID

S

D

Choosepath from

N x nxt hop

Choosepath from

N x nxt hop

Re-assign traffic to path at each hop

Tandem “ECMP” just like IP.

Need to keep O(degree) number of next hops

Only need one B-VID .. removes O(diameter) from state cost

Flip side is you have no control – just hope for fine scale statistical distribution


What about loops in this mode

What about loops in this mode?

802.1aq Ingress Check is very strong in the case of a single next hop and hence

a single possible ingress for an SA.

802.1aq Ingress Check is weakened in the case of a multiple next hop and hence

Multiple possible ingress for an SA.

However 802.1aq Agreement Protocol functions correctly in the context of multiple possible Next Hops for the same B-VID (refer to Mick’s proof).

But …


Agreement protocol concerns

Agreement Protocol Concerns

Is it too complex? it is clearly non trivial, we need implementation/emulation experience.

Is it overly Draconian. For example the bounds on movement are what is required for a mathematical proof by induction .. However there are probably many cases where further movement would not loop. What isthe degree of ‘overkill’ ?

Is it marketable? – this is unfortunately a legitimate concern!!!

802.1aq can be deployed without AP until we introduce hash basedforwarding at which point we either require a symmetric AP and/oran on-data-path loop detection/drop mechanism.

Believe that an on-data-path loop detection mechanism is requiredfor hash based ECMP until we have more experience with AP.

Recommend we standardize a TTL TAG either stand-alone or as a new form of I-TAG.


View of new work requirements

View of New Work Requirements

R1) New ECT-ALGorithms with improved spreading properties.

R2) Allow optional head end hash assignment of 802.1aq SPBM UNI known unicasttraffic to one of multiple next hop interfaces/B-VIDs. Very similar to Link Ag.Minimally HASH (seed, C.SA, C.DA, C-VID, [ IP.SA, IP.DA, IP.PROTO] )

R3) Allow optional tandem hash assignment of 802.1aq SPBM B-VID NNI unicasttraffic to one of multiple next hop interfaces. Essentially a new SPBM ECT-ALGwith its own B-VID. (i.e. new ECT-ALGorithms, all usable at same time)Minimally HASH (seed, B-VID, C.SA, C.DA, C-VID, [ IP.SA, IP.DA, IP.PROTO ])

R4) minor OA&M changes in support of R2 and R3, because symmetry/congruence broken.

R5) More experience with AP, emulations, simulations etc. +addition of TTL to new I-TAG or a TTL-TAG.


  • Login