Tecrst 2310
1 / 152

Deployment and Operation of BGP - PowerPoint PPT Presentation

  • Uploaded on

TECRST-2310. Deployment and Operation of BGP. Agenda. Introduction to BGP BGP General Operation BGP Attributes and Policy Control BGP Path Selection Algorithm Applying Policy with BGP Multi-Protocol BGP BGP Load Balancing Full Mesh IBGP BGP Route-Reflectors Scaling BGP Updates

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Deployment and Operation of BGP' - chidi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Tecrst 2310


Deployment and Operation of BGP


Introduction to BGP

BGP General Operation

BGP Attributes and Policy Control

BGP Path Selection Algorithm

Applying Policy with BGP

Multi-Protocol BGP

BGP Load Balancing

Full Mesh IBGP

BGP Route-Reflectors

Scaling BGP Updates

BGP Fast Convergence

A Little BGP “Show and Tell”

Autonomous system
Autonomous System

A network sharing the same routing policy

Possibly multiple IGPs

Usually under single administrative control

Contiguous internal connectivity

Numbering range form 1 to 65,535—Globally unique—“AS Number”

Private range: 64512–65534

Reserved: 0 and 65535

Border gateway protocol bgp
Border Gateway Protocol - BGP

BGP is classified as a path vector routing protocol (see RFC 1322)

A path vector protocol defines a route as a pairing between a destination and the attributes of the path to that destination.

BGP used internally (iBGP) and externally (eBGP)

iBGP used to carry

Some/all Internet prefixes across ISP backbone

ISP’s customer prefixes

eBGP used to

Exchange prefixes with other Autonomous Systems (ASes)

Implement routing policy

Bgp basics
BGP Basics





AS 101

AS 100




BGP speakers are called peers or neighbors


AS 102

External bgp ebgp
External BGP - eBGP

Between BGP speakers in different AS

Usually directly connected

Usually sets next-hop to self

Router A

router bgp 1

neighbor remote-as 2

Router B

router bgp 2

neighbor remote-as 1

AS 2



  • neighbor route-map X {in|out}

  • .

  • .

  • route-map X permit 10

  • {set | match} <attribute>



AS 1

Internal bgp ibgp
Internal BGP - iBGP

Neighbor in same AS

Next-hop unchanged…usually

May be several hops away

Don’t forward iBGP learned routes to other iBGP peers

n*(n-1)/2 peering mesh – scaling problem!

Route-Reflectors relax this constraint



  • Router B:

    • router bgp 1 neighbor remote-as 1

    • Router A:

    • router bgp 1

    • neighbor remote-as 1

Ibgp and loopback interfaces
iBGP and Loopback Interfaces

RtrA RtrB

interface loopback0 interface loopback0

ip address ip address

! !

Router bgp 100 router bgp 100

neighbor remote-as 100 neighbor remote-as 100

neighbor update-source loopback0 neighbor update-source loopback0

AS 100



Why not peer to the address assigned to a physical interface?

Reasons for using bgp
Reasons for Using BGP

You need to scale your IGP

You’re a multihomed ISP customer and need to implement routing policy

You’re an MPLS/VPN subscriber to an SP service and want to run dynamic routing between CE and PE routers

Using bgp to scale your igp
Using BGP to Scale Your IGP

Scaling a large network—“Divide and Conquer”


Periodic IGPs/flooding

Isolate network instability

Complex policies

Control reachability to prefixes

Merge separate organizations

Connect multiple IGPs

Best path selection for cisco routers which route is best
Best Path Selection for Cisco RoutersWhich Route Is Best?

First, always take the next-hop advertising the longest prefix (most specific route to destination)

Choose next-hop advertising over the next-hop advertising

If two next-hop routers advertising exact same route, refer to Default Administrative distances as index of believability

See table on the right

Lower is more believable

Defaults can be modified if necessary (with caution)

Bgp general operation
BGP General Operation

Learns multiple paths via internal and external BGP speakers

Picks the best path and installs in the forwarding table

Policies applied by influencing the best path selection

Summary of operation
Summary of Operation

TCP connection established (port 179)

Both peers attempt to connect—There is an algorithm to resolve “connection collisions”

Exchange messages to open and confirm the connection parameters

Initial exchange of entire table

Incremental updates after initial exchange

Keepalive messages exchanged when there are no updates

What are incremental updates
What Are Incremental Updates?

IGPs typically rebroadcast routes

BGP runs over TCP => reliable date delivery

Once BGP sends a route to a peer, it assumes the peer will keep it unless:

A replacement route is sent—Implicit withdraw of old route

The route is withdrawn—Explicit withdraw

The BGP session goes down (keepalive failure)

Inserting prefixes into bgp
Inserting Prefixes into BGP

Two ways to insert/originate prefixes into BGP

Redistribute (static or dynamic)

Network command

Always necessary for default route

Default rules for re-advertising BGP learned prefixes to other BGP neighbors

eBGP learned routes are sent to all eBGP and iBGP peers

ee, ei

iBGP learned routes are sent to all eBGP but NO iBGP peers


Exception: iBGP Route-Reflectors

Inserting prefixes into bgp redistribute
Inserting Prefixes into BGP - Redistribute

Configuration Examples:

router bgp 109

redistribute static

ip route serial0

router bgp 109

redistribute eigrp 100

Inserting prefixes into bgp network
Inserting Prefixes into BGP - Network

Used to tell BGP which networks to advertise to neighbors; unlike IGPs, the network command is not used to determine which interfaces will be active for the protocol; networks must be in the IP routing table in order for them to be advertised


router bgp 100

neighbor x.x.x.x remote-as Y

network If auto-summary is on then a specific route from must be in the routing table; if auto-summary is off then the prefix must be in the IP routing table

network mask Must be an exact match in the IP routing table

Inserting prefixes into bgp network command
Inserting Prefixes into BGP – Network Command

Configuration Example

router bgp 109

network mask


A matching route must exist in the routing table before the network is announced

Exact prefix length

“show ip route x.x.x.x” must return exact route before BGP will advertise

Static route can be real next hop or null0 interface

ip route

ip route null0

ip route null0 250

Route metrics
Route Metrics

OSPF has a dimensionless metric based on interface speed

EIGRP has a 5-tuple

[(K1 * BW + K2 * BW/(256 – Load) + K3 * Delay) * K5/(K4 + Reliability] * 256

RIP has a hop count

BGP has …

Bgp attributes more than just route cost
BGP Attributes(More Than Just Route Cost…)

AS path

Next hop


Local preference

Multi-Exit Discriminator (MED)




Originator ID

Cluster list

What is an attribute
What Is an Attribute?

Properties associated with a prefix/route

Used to determine the best path to a destination when multiple paths exist

Attribute Categories

Well-known, mandatory

Well-know, discretionary

Optional, transitive

Optional, non-transitive

Next Hop

AS Path





As path

Sequence of ASes a route has traversed

Loop detection

Apply policy

Well-known, Mandatory, Code = 2

AS 200

AS 100 300 200 100 300 200

AS 300

AS 400 300 200 100 300 200 300 400

AS 500

Next hop
Next Hop

Next hop to reach a network

Usually a local network is the next hop in eBGP session

Well-known, Mandatory, Code = 3

AS 200

AS 300



AS 100

Next hop1
Next Hop

Next hop not changed



AS 200




AS 300

AS 100

Local preference
Local Preference

Well-known, Code = 5

AS 100

AS 200

AS 300







AS 400 500

> 800


Local preference1
Local Preference

Local to an AS

Local preference set to 100 when heard from neighbouring AS

Used to influence BGP path selection

Determines best path for outbound traffic

Path with highest local preference wins

Local preference2
Local Preference

Configuration of Router B:

router bgp 400

neighbor remote-as 300

neighbor route-map local-pref in


route-map local-pref permit 10

match ip address prefix-list MATCH

set local-preference 800


ip prefix-list MATCH permit

ip prefix-list MATCH deny le 32

Multi exit disc med or metric

4 octets

Used by a BGP speaker’s Decision Process to discriminate among multiple entry points into a neighboring autonomous system.

If MED is missing, it is assumed MED=0

If bgp bestpath missing-as-worst then it is assumed the MAXIMUM value

Optional, Non-transitive, Code = 5

Multi exit disc med or metric1
MULTI_EXIT_DISC (MED or Metric) /24

MED = 10

Route with lowest MED wins!!

MED 20

How to scale routing policy
How to Scale Routing Policy


NOT in decision algorithm

BGP route can be a member of many communities

Really just a number for grouping prefixes.

Typical communities:

Destinations learned from customers

Destinations learned from ISPs or peers

Destinations in VPN—BGP community is fundamental to the operation of BGP VPNs

Bgp attributes community

Activated per neighbor/peer-group:

neighbor {peer-address | peer-group-name} send-community

Carried across AS boundaries

BGP community values are configured as a 32-bit number (old format) or as a 2x2 byte number (new format).

Common convention is stringof four bytes: <AS>:[0-65536]

Ip bgp community new format
IP BGP-Community New-Format

Specifies that communities be displayed in a 4-byte AA:NN format

AA identifies the autonomous system

NN is a number that identifies the community within the autonomous system.

r2#show ip bgp

BGP routing table entry for 65001:100:, version 9


Community: 6553700

r2 (config)#ip bgp-community new-format

r2#show ip bgp

BGP routing table entry for 65001:100:, version 9


Community: 100:100

Bgp attributes community cont
BGP Attributes: COMMUNITY (Cont.)

Each destination can be a member of multiple communities

Using a route-map: set community

<1-4294967295> community number

aa:nn community number in aa:nn format

additive Add to the existing community

none No community attribute

local-AS Do not send to EBGP peers (well-known community)

no-advertise Do not advertise to any peer (well-known community)

no-export Do not export outside AS/confed (well-known community)

Bgp path selection algorithm1
BGP Path Selection Algorithm

Do not consider path if no route to next hop

Example: Router learns a route from an eBGP peer and then advertises to an iBGP peer. If the iBGP peer does not know how to reach the next hop the route is rejected. iBGP usually does not change the next hop.

Do not consider iBGP path if not synchronized


A BGP Router Will Not Accept a Route from an iBGP Neighbor Unless the Route Is Already in the IP Routing Table

Rtr B

Rtr A

Rtr C




  • Rtr B does not know about; therefore, Rtr C should not advertise to Rtr D

  • Redistribute into IGP, use a full iBGP mesh or disable synchronization if iBGP path = physical path.

Rtr D

Bgp path selection algorithm2
BGP Path Selection Algorithm

Highest weight (local to router)

Highest local preference (global within AS)

Prefer locally originated route (aggregate address)

Shortest AS path

Bgp path selection algorithm cont
BGP Path Selection Algorithm (Cont.)

Lowest origin code

IGP < EGP < incomplete

IGP – network command

EGP – from an eBGP neighbor

Incomplete - redistribution

Lowest Multi-Exit Discriminator (MED)

If bgp deterministic-med, order the paths before comparing

(not the default but recommend using it)

If bgp always-compare-med, then compare for all paths

otherwise MED only considered if paths are from the same AS (default)

Bgp path selection algorithm cont1
BGP Path Selection Algorithm (Cont.)

Prefer eBGP path over iBGP path

Path with lowest IGP metric to next-hop

For eBGP paths

If multipath enabled, install N parallel paths in routing table

If router-ID is the same, go to next step

If router-ID not the same, select “oldest”

Bgp path selection algorithm cont2
BGP Path Selection Algorithm (Cont.)

Lowest router-id (originator-id for reflected routes)

Shortest Cluster-List

Client must be aware of Route Reflector attributes!

Lowest neighbor IP address

Constructing the forwarding table
Constructing the Forwarding Table

Input policies


BGP in











best paths

BGP out



output policies

Applying policy with bgp1
Applying Policy with BGP

Policy based on various attributes:

AS path


Destination prefix

Many, many others…

Reject/accept selected routes

Set attributes to influence path selection

Tools (IOS):

Distribute-list or prefix-list

Filter-list (as-path access-list)


Route-maps (the Swiss army knife)

Policy control prefix list
Policy Control - Prefix List

Per-peer prefix filter, inbound or outbound

Allows coverage for ranges of prefix lengths (ge, le)

Based upon network numbers in NLRI (using familiar IPv4 address/mask format)

Example configuration:

router bgp 200

neighbor remote-as 210

neighbor prefix-list PEER-IN in

neighbor prefix-list PEER-OUT out


ip prefix-list PEER-IN deny

ip prefix-list PEER-IN permit le 32

ip prefix-list PEER-OUT permit

ip prefix-list PEER-OUT deny le 32

Policy control prefix list1
Policy Control - Prefix List

a.b.c.d/x [ge | eq | le] y

  • care vs. don’t care bits

  • base prefix length to match

  • operator

  • operand

ip prefix-list PEER-IN permit le 32

  • le 32 = all 10.x.x.x subnets, regardless of mask length

  • (e.g.,,

Policy control prefix list2
Policy Control - Prefix List

More Examples: eq 32 = all /32 prefixes (e.g. = eq 24 (ONLY ge 28 = all subnets from that have a mask length of /28 or greater (e.g.

Policy control filter list
Policy Control - Filter List

Filter routes based on AS path

Inbound or Outbound

Example Configuration:

router bgp 100

neighbor filter-list 5 out

neighbor filter-list 6 in


ip as-path access-list 5 permit ^200$

ip as-path access-list 6 permit ^150$

Policy control regular expressions
Policy Control - Regular Expressions

Simple Examples

.* Match anything

.+ Match at least one character (cannot be empty)

^$ Match routes local to this AS (as-path is empty)

_1800$ Originated by 1800 (as-path ends with 1800)

^1800_ Received from 1800 (as-path starts with 1800)

_1800_ Via 1800 (1800 is somewhere in the middle of the as-path)

_790_1800_ Passing through 1800 then 790

For more information on regular expressions:



Policy control setting communities
Policy Control – Setting Communities

Example Configuration

router bgp 100


neighbor remote-as 200

neighbor send-community

neighbor route-map set-community out


ip bgp-community new-format


route-map set-community permit 10

match ip address prefix-list NO-ANNOUNCE

set community no-export


route-map set-community permit 20

match ip address prefix-list EVERYTHING


ip prefix-list NO-ANNOUNCE permit ge 17

ip prefix-list EVERYTHING permit le 32

Policy control matching communities
Policy Control – Matching Communities

Example Configuration

router bgp 100

neighbor remote-as 200

neighbor route-map filter-on-community in


route-map filter-on-community permit 10

match community 1

set local-preference 50


route-map filter-on-community permit 20

match community 2 exact-match

set local-preference 200


ip community-list 1 permit 150:3 200:5

ip community-list 2 permit 88:6

Mp bgp rfc4760
MP-BGP (RFC4760)

Extension to the BGP protocol

Carry routing information about other protocols:

IPv4 and IPv6 Unicast

IPv4/IPv6 + Label (RFC 3107, 6PE)

IPv4 and IPv6 Multicast

Multi-Protocol Label Switching (MPLS) VPN (IPv4 and IPv6)

Layer 2 VPN

…many others proposed

Multi-Protocol Capabilities must be negotiated at session setup time (important!)

Mp bgp attributes
MP-BGP Attributes

New non-transitive and optional Border Gateway Protocol (BGP) attributes


“Carry the set of reachable destinations together with the next-hop information to be used for forwarding to these destinations” (RFC4760)


Carry the set of unreachable destinations

Note: NEXT_HOP has different format for different AFI/SAFI

Mp bgp attributes cont
MP-BGP Attributes (Cont.)

Attribute contains one or more triples:

Address Family Information (AFI) with Sub-AFI

Identifies type of protocol information carried in the Network Layer Reachability Info (NRLI) field

Next-hop information

Reachability/non-reachability information

Mp bgp capabilities negotiation cont
MP-BGP Capabilities Negotiation (Cont.)

BGP router sends an OPEN message with CAPABILITIES parameter containing its capabilities :

Mp bgp session establishment
MP-BGP Session Establishment

BGP: 3FFE:B00:C18:2:1::1 sending OPEN, version 4, my as: 100

BGP: 3FFE:B00:C18:2:1::1 rcv OPEN, version 4

BGP: 3FFE:B00:C18:2:1::1 rcv OPEN w/ OPTION parameter len: 16

BGP: 3FFE:B00:C18:2:1::1 rcvd OPEN w/ optional parameter type 2 (Capability) len 6

BGP: 3FFE:B00:C18:2:1::1 OPEN has CAPABILITY code: 1, length 4

BGP: 3FFE:B00:C18:2:1::1 OPEN has MP_EXT CAP for afi/safi: 2/1

BGP: 3FFE:B00:C18:2:1::1 rcvd OPEN w/ optional parameter type 2 (Capability) len 2

BGP: 3FFE:B00:C18:2:1::1 went from OpenSent to OpenConfirm

BGP: 3FFE:B00:C18:2:1::1 went from OpenConfirm to Established

%BGP-5-ADJCHANGE: neighbor 3FFE:B00:C18:2:1::1 Up

AS 123

AS 321

Load balancing
Load Balancing

BGP isn’t inherently designed to load-balance traffic

By default, BGP chooses, installs, and advertises one “best” route

Attempting to balance traffic comes in two parts

Inbound traffic

Outbound traffic

Load balancing is relatively trivial in some topologies

A pair of eBGP peers connected via multiple links

Two connections from one router to the same AS

…but not others

Multi-homed to more than one provider

Single path ebgp multihop
Single Path – eBGP Multihop

Router A configuration:

interface loopback 0

ip address


router bgp 100

neighbor remote-as 200

neighbor update-source loopback0

neighbor ebgp-multi-hop


ip route serial 0

ip route serial 1


  • A must do a recursive lookup for

  • A has two equal cost paths to

  • A will load balance traffic over these two links

  • B must be configured similarly for bidirectional load balancing

Loopback 0





Ebgp multipath support
eBGP Multipath Support

A peers with multiple routers in the same neighbor AS

Install multiple routes in IP routing table

Use ‘maximum-paths ebgp’ command

Routes must be identical in terms of LOCAL_PREF, AS_PATH, MED, etc… (probably true if coming from the same AS)

Outbound traffic will be split over these two links

A still advertises one best path to peers

Next-hop is set to self (using loopback interface)




Multi homed as
Multi-Homed AS

Very common topology for many customers

Customer wants to split traffic between AS 100 and AS 300

Misconception: “I’ll make half of my routes preferred via AS 100 and the other half through AS 300. Then I’ll have load-balancing!!”…no, you’ll have prefix splitting!

AS 100

AS 300





AS 200

Multi homed as1
Multi-Homed AS

Huge difference between “load balancing” and “prefix splitting”

Traffic may be balanced perfectly…until traffic patterns change

Some customers use this method but they are forced to change their policies to accommodate for changes in traffic patterns

For outbound balancing use


LOCAL_PREF (recommended)

For inbound balancing use


AS_PATH prepending (may not work)

MEDs (may not work)

Communities and LOCAL_PREF (recommended…but requires upstream coordination!)

Bgp multipath
BGP Multipath

Multiple eBGP paths can be flagged as multipath as long as the paths are “similar”

“Similar” means that all relevant BGP attributes are a tie (up to next-hop metric)

If paths 1 and 2 both have a local-pref of 200, MED of 300, etc… but the Router-IDs are different then paths 1 and 2 are eligible for multipath

These paths are installed in the RIB/FIB to load-balance outbound traffic

Multipath is the correct approach to a difficult problem but not terribly useful because it can only be used in one specific topology

iBGP multipath and Link-BW will help correct this

Ibgp without multipath
iBGP: Without Multipath

R1 has two paths for

Both paths are identical in terms of localpref, med, IGP cost to next-hop, etc

Router-ID, peer-address, etc are different but these are arbitrary in terms of selecting a best path

R1 will select one path as best and send all traffic for towards one of the exit points




AS 200

AS 100



Ibgp multipath
iBGP Multipath

Flag multiple iBGP paths as ‘multipath’

Each path must have a unique NEXT_HOP

All multipaths are inserted the RIB/FIB

Number of multipaths can be controlled

maximum-paths ibgp <1-6>

Still advertise a single bestpath

Each BGP next-hop is resolved and mapped to available IGP paths (not next-hop-self unless routing follows forwarding)

Supported on all IOS versions in past ~10 yrs

Ibgp with multipath
iBGP: With Multipath

R1 has two paths for

Both paths are flagged as “multipath”




AS 200

AS 100



R1#sh ip bgp

200 from (

Origin IGP, metric 0, localpref 100, valid, internal, multipath

200 from (

Origin IGP, metric 0, localpref 100, valid, internal, multipath, best

Ibgp multipath1
iBGP Multipath

These two paths are installed in the RIB/FIB

Traffic is load-balanced across the two paths/exit points based on per-packet hash

Depending on platform/version, there may or may not be multiple levels of load balancing (IGP + BGP)

R1#sh ip route

Routing entry for

*, from, 00:00:09 ago

Route metric is 0, traffic share count is 1

AS Hops 1, from, 00:00:09 ago

Route metric is 0, traffic share count is 1

AS Hops 1

R1#show ip cef, version 237, per-destination sharing

0 packets, 0 bytes

via, 0 dependencies, recursive

traffic share 1

next hop, FastEthernet0/0 via

valid adjacency

via, 0 dependencies, recursive

traffic share 1

next hop, FastEthernet0/0 via

valid adjacency

Eibgp multipath
eiBGP Multipath

The traffic destined to a site may be load shared between all entry points.

From the MPLS/VPNs provider’s point of view, these entry points may not all correspond to internal or external peers.

The intent is for the MPLS/VPN network to be transparent to the customers.

The ability to consider both iBGP and eBGP paths, when using multipath, is needed.

Paths must match up to MED attribute

*Applies Only to the MPLS/VPN Case*

Eibgp multipath1
eiBGP Multipath

PE-2 has two possible paths into Site-1

eiBGP Multipath allows both paths to be used.









maximum-paths eibgp <num>

Full mesh ibgp1
Full Mesh iBGP

“If a particular AS has multiple BGP speakers and is providing transit service for other ASes, then care must be taken to ensure a consistent view of routing within the AS. A consistent view of the interior routes of the AS is provided by the IGP used within the AS. For the purpose of this document, it is assumed that a consistent view of the routes exterior to the AS is provided by having all BGP speakers within the AS maintain IBGP sessions with each other.”

RFC 4271

Full mesh ibgp2
Full Mesh iBGP

If a router learns a route from an iBGP peer, it will not re-advertise that route to another iBGP peer


Because BGP relies on the AS Path to prevent loops

iBGP peers are in the same AS, so they do not add anything to the AS Path


There’s no way to tell if a route advertised through several iBGP speakers is a loop!


Learns eBGP

Advertises iBGP

Learns iBGP

Do not advertise iBGP








Full mesh ibgp3
Full Mesh iBGP

How scalable is using a full mesh of iBGP speakers?

2 speakers == 1 session

3 speakers == 3 sessions

4 speakers == 6 sessions

5 speakers == 10 sessions

n(n-1)/2 = O(n2) sessions

(n-1) sessions per speaker

How can we better handle scale?

Confederations (yuck)

Route Reflectors (hooray!)














Confederation 100

Bgp route reflectors1
BGP Route Reflectors

Route Reflector Basics

Hierarchical Route Reflectors

Deploying Route Reflectors

Route Reflector Redundancy

Route reflector basics
Route Reflector Basics

A route reflector is an iBGP speaker that reflects routes learned from iBGP peers to other iBGP peers

Route reflectors are designated by configuring some of their iBGP peers as route reflector clients

Route reflectors



neighbor <A> route-reflector-client

neighbor <B> route-reflector-client

Route reflector basics1
Route Reflector Basics

A route reflector client is just an iBGP speaker

There is no special configuration for a route reflector client

Route reflectors



Route reflector client

neighbor <A> route-reflector-client

neighbor <B> route-reflector-client

Route reflector basics2
Route Reflector Basics

A cluster is a route reflector and its clients

Route reflector clusters may overlap

Route reflectors




Route reflector client

neighbor <A> route-reflector-client

neighbor <B> route-reflector-client

Route reflector basics3
Route Reflector Basics

A non-client is any route reflector iBGP peer that is not a route reflector client

Each route reflector is also a non-client of each other route reflector in this network

Route reflectors must be fully iBGP meshed with non-clients

Route reflectors





Route reflector client

neighbor <A> route-reflector-client

neighbor <B> route-reflector-client

Hierarchical route reflectors motivation
Hierarchical Route Reflectors - Motivation

All of the route reflectors will need to be fully meshed

Reflectors still follow the normal rules of iBGP route propagation between themselves

This full iBGP mesh between reflectors can still contain so many routers that it presents a scaling problem as well

Full iBGP mesh between reflectors



Hierarchal route reflectors
Hierarchal Route Reflectors

To resolve this, route reflectors can be deployed in a hierarchy

A single router can be a reflector client and a reflector

Client and reflector




Hierarchal route reflectors1
Hierarchal Route Reflectors

An unlimited number of tiers can be used

But very rare to see more than 3 levels

Edges of route reflector tiers are a natural place to reduce the amount of routing information being carried in the lower tiers

RRs would be ABRs in “textbook” network design

The same topology rule applies: The reflector topology should follow the physical topology to prevent loops and black holes

RRs can lead to suboptimal routing because they can hide full path information from clients (RRs can advertise a single best path).

Route reflector basics4
Route Reflector Basics

Send the route to all clients

Send the route to all non-clients




If a Route Reflector Receives a Route from an eBGP Peer:


iBGP peer

eBGP peer


iBGP peer



Route reflector basics5
Route Reflector Basics

Reflect the route to all clients

Unless “no client-to-client reflection”

Reflect the route to all non-clients

Send the route to all eBGP peers




If a Route Reflector Receives a Route from a Client:


iBGP peer

eBGP peer


iBGP peer



Route reflector basics6
Route Reflector Basics

Reflect the route to all clients

Send the route to all eBGP peers




If a Route Reflector Receives a Route from a Non-Client:


iBGP peer

eBGP peer


iBGP peer



Route reflector basics7
Route Reflector Basics

What we need is a mechanism to prevent loops within the AS!

RFC2796 defines two BGP attributes to provide loop detection within an AS

Originator ID

Set to the router ID of the router injecting the route into the AS

Cluster List

Each route reflector the route passes through adds their cluster ID to this list. Cluster-id = Router ID by default

Route reflector basics8
Route Reflector Basics

When reflecting a route, a route reflector always:

Creates a cluster list if one doesn’t exist and adds its router ID (or configured cluster ID)

Adds the router ID of the peer it received the route from as the Originator ID

Deploying route reflectors
Deploying Route Reflectors

Use the divide and conquer approach to convert from a full iBGP mesh to route reflectors

Divide network into multiple clusters, using the physical topology as a guide to the logical divisions

Pick out one router to act as the reflector in each cluster, making certain reflection follows the physical topology

Remove redundant iBGP sessions as you configure reflectors in each cluster

Deploying route reflectors1
Deploying Route Reflectors

This small network has nine routers, and 36 iBGP sessions

First, choose clusters using the physical topology as a guide

Next choose reflectors based on the physical topology











Physical links

iBGP sessions

Deploying route reflectors2
Deploying Route Reflectors

Configure each client in a single cluster

Remove extra iBGP sessions

Start with B










neighbor <f> route-reflector-client

neighbor <h> route-reflector-client

neighbor <d> route-reflector-client

Physical links

iBGP sessions

Deploying route reflectors3
Deploying Route Reflectors

Next, configure G, E, and J as route reflector clients of C

Remove extra iBGP sessions










neighbor <g> route-reflector-client

neighbor <e> route-reflector-client

neighbor <j> route-reflector-client

Physical links

iBGP sessions

Deploying route reflectors4

The resulting network has nine iBGP sessions along physical links

Deploying Route Reflectors










Physical links

iBGP sessions

Route reflector design and redundancy
Route Reflector Design and Redundancy

A client may peer with more than one reflector, in different clusters

A client that peers to only one reflector has a single point of failure

Clients should peer to at least two reflectors to provide redundancy

How many reflectors should a single client be peered to?

Where should the RRs be placed in the network?

How many RRs are needed?

Route reflector design and redundancy1
Route Reflector Design and Redundancy

Redundancy is needed but….

Too much burns memory on RRCs because the client learns the same information from each RR

Also burns memory on the RRs because they learn multiple paths for each route introduced by a RRC

Two route reflectors per client should be plenty…

…but this is not a hard and fast rule

As with everything else….”it depends”

PEs, RRs, SLAs, network size, network topology, etc.

Other sessions dedicated to this topic…

Scaling bgp updates1
Scaling BGP Updates


Peer Groups

Input Queue Tuning

Path MTU Discovery


Why aggregate?

Reduce number of Internet prefixes

Advertise only your CIDR block

According to some studies, about 50% of the current Internet routing table represents “leakage past aggregates”

Increase stability

If you aggregate properly, the aggregate will remain stable even if specific components of the aggregate come and go

Perhaps your upstream provider will not allow the more specifics (filter long prefixes, dampening)


One of the easiest ways to scale eBGP is to aggregate routing information

To configure aggregation in BGP, use the aggregate address command

Aggregated route is created if we have at least one component:

Components are the longer length prefixes that fall within the aggregate’s range

By default:

The aggregate address command only creates an aggregate

For the new created aggregate route, AS-PATH=NULL and other attributes are default for local routes “65001” “65001 65002” “”


AS65101 “65100 65001” “65100 65002” “65100”



Adding the keyword summary-onlycauses BGP to suppressthe components of the aggregate

Suppressed route = use it, but do not advertise it to any peer “65001” “65001 65002” “”


AS65101 “65100”

aggregate-address summary-only


For the aggregate, AS-PATH = AS Set made by merging of all ASes of all the components

Additionally (not shown):

Merge all the communities and extended-communities of all components

Adding the Keyword as-set Causes BGP to: “65001” “65001 65002” “”


AS65101 “65100 {65001 65002}”

aggregate-address summary-only as-set


Use a route map to set the aggregate’s other attributes. “65001” “65001 65002” “” LP=200


AS65101 “65100 {65001 65002}”

aggregate-address summary-only as-set route-map foo

route-map foo

set local-preference 200


Other aggregate commands

advertise-map <route-map>: to select which components are considered as part of the aggregate

suppress-map <route-map>: to select which components we want to suppress

neighbor … unsuppressed-map <route-map>: to unsuppress (advertise) a suppressed component towards a particular peer


Creating an aggregate with an aggregate command

Adds AGGREGATOR attribute (troubleshooting info with the IP and AS of the router that did the aggregation)

If as-set keyword is NOT used: AtomicAggregate attribute is also added (troubleshooting info that indicates loss of AS Path information)

Peer groups
Peer Groups

What is it?

A way to group peers with similar configuration

Configuration of neighbor is now done in 2 steps:

Define a peer-group like a neighbor

It has associated neighbor commands, policies, etc.

Define individual neighbors as a member of that peer-group

All the configuration of the peer-group applies to the member

Reasons for using peer-groups:

Ease of administration


Peer groups1
Peer Groups

Ease of administration:

Offering customers a few options in the number of routes they receive, rather than filtering per customer

Classifying peering arrangements with other providers so you only manage two or three types of connections

Example for customer types:

cust-default—send default route only

cust-cust—send customer routes only

cust-full—send full Internet routes

Peer groups2
Peer Groups

Full Routes

Peer Group


Peer Group

Customer Routes

Peer Group

Core Peer Group


Your AS

CIDR Block:

Route Reflector

Aggregation Router

(RR Client)

Client Peer Group

Peer groups3
Peer Groups


router bgp 65000

neighbor remote-as 65001

neighbor route-map cust-receive in

neighbor route-map cust-default out

neighbor send-community

neighbor remote-as 65002

neighbor route-map cust-receive in

neighbor route-map cust-default out

neighbor send-community

neighbor remote-as 65003

neighbor route-map cust-receive in

neighbor route-map cust-default out

neighbor send-community


router bgp 65000

neighbor cust-default route-map cust-receive in

neighbor cust-default route-map cust-default out

neighbor cust-default send-community

neighbor remote-as 65001

neighbor peer-group cust-default

neighbor remote-as 65002

neighbor peer-group cust-default

neighbor remote-as 65003

neighbor peer-group cust-default

Defining peer-groups

Applying peer-groups to neighbors

Peer groups4
Peer Groups

Peer groups also improve scaling

Advertising 100,000+ routes to hundreds of peers is a big challenge from a scalability point of view

(1) Each packet to each peer must be individually formatted

(2) Each packet to each peer must be individually transmitted

Peer-groups makes possible to do (1) only once for all the members of the peer-group

GOLDEN RULE of peer-groups

Outbound policy MUST be unique

Individual peers cannot be configured with outbound policy

Peer groups5
Peer Groups

Update generation without peer groups

BGP table is walked for every peer

Updates are generated and sent to each peer

Update generation with peer groups

A peer-group leader is elected for each peer group

The BGP table is walked for the leader only

Updates are generated, transmitted by the peer-group leader, and replicated and transmitted by the rest of peer-group members

Peer groups6
Peer Groups

For the same amount of convergence time

Beyond peer groups
Beyond Peer Groups

Today peer-groups are not used but live in spirit

Peer-groups still can be configured

But we have decoupled its two functions:

Scalability: update-groups

Administration: peer templates

Beyond peer groups1
Beyond Peer Groups


Software automatically groups neighbors that can be included in the same update-group

Basically, all the neighbors that share outbound policy

Only one update is formatted for each update-group

To check how many update-groups and members are created:

show ip bgp [<af>] update-group

Beyond peer groups2
Beyond Peer Groups


Configuration is similar to peer-groups

Define a peer-template with configuration commands

Individual neighbor is configured to inherit commands from peer-template

And additionally:

multiple peer-templates can be applied to a neighbor

peer-templates can be applied to another peer-template

No GOLDEN rule: individual peers can be configured with outbound policy

Two types of peer templates

peer-session: defines session commands (update independent)

Remote-as, update-source …

peer-policy: defines policy commands (associated to updates)

Route-map inbound, route-map outbound, remove-private-as, …

Neighbors can still be grouped in update-groups

If “total” outbound policy is the same

Beyond peer groups3
Beyond Peer Groups

Peer-template (peer-policy) example

router bgp 1

template peer-policy ppol1 route-map map1 outfilter-list 1 in inherit peer-policy ppol2 10 inherit peer-policy ppol3 5

template peer-policy ppol2 filter-list 2 indistribute-list 2 inroute-reflector-client

template peer-policy ppol3 distribute-list 3 in


address-family ipv4 neighbor route-map map0 out neighbor inherit peer-policy ppol1

Neighbor for IPV4 uses:

Route-map out = map0distribute-list in = 2Filter-list in = 1

It’s a route-reflector client

Uses next-hop self

Input queue tuning
Input Queue Tuning

Large bursts of input packets may overflow the input hold-queue and produce input queue drops

BGP packets may be dropped when many BGP peers are reached via the same interface (usually an Ethernet interface)

The final effect is that the available bandwidth is lower than the available bandwidth (TCP congestion window is reduced)


Increase input hold-queue:

hold-queue <1 – 4096> in (default is only 75!)

Give extra buffer for BGP packets (marked with precedence 6):

spd headroom <0-65535> (default in last version is good : 2000)

show [ip] spd to verify

Larger input queues
Larger Input Queues

For the same amount of convergence time

Results from increasing the interface

input queue depth from 75 (default) to 1000

Tcp path mtu discovery
TCP Path MTU Discovery

MSS (Max Segment Size)

Largest segment that can traverse a TCP session

Does not include IP or TCP headers

MSS is 536 bytes by default (in multihop sessions)

Anything larger must be fragmented & re-assembled

536 bytes is inefficient for Ethernet (1500) & POS (4470)

Increases the number of IP packets

Makes TCP work harder

Slows BGP convergence and reduces scalability

Solution: ip tcp path-mtu-discovery

Another helpful command:

show ip bgp neighbors | include max data

Tcp path mtu discovery1
TCP Path MTU Discovery

Configuring path MTU discovery between BGP peers can provide dramatic results in the speed of convergence

MSS increased from 536 to 1460 Bytes (GE)






Supported Peers












MSS Formula = Lowest MTU - IP overhead (20 bytes) – TCP overhead (20 bytes)

Faster convergence
Faster Convergence

Increased focus on faster BGP convergence

Critical for traffic (i.e. voice, video)

VPN customers want IGP like convergence

Several factors influence BGP convergence





Faster convergence1
Faster Convergence

Typically two scenarios where we need faster convergence

Single route convergence

A bestpath change occurs for one prefix

How quickly can BGP propagate the change throughout the network?

How quickly can the entire BGP network converge?

Key for VPNs and voice networks

Bootup or “clear ip bgp *” convergence

Most stressful scenario for BGP

CPU may be busy for several minutes

Limiting factor in terms of scalability

Key for any router with a full Internet table and many peers

Convergence basics bgp scanner
Convergence Basics – BGP Scanner

BGP Scanner plays a key role in convergence

Full BGP table scan happens every 60 seconds

bgp scan-time X

Affects only some AF dependent tasks, most tasks are still perform every 60 seconds

Full scan performs multiple housekeeping tasks

Validate nexthop reachability

Validate bestpath selection

Route redistribution and network statements

Conditional advertisement

Route dampening

BGP Database cleanup

Import scanner runs once every 15 seconds

Imports VPNv4 routes into vrfs

bgp scan-time import X

Convergence basics bgp nexthops
Convergence Basics – BGP Nexthops

Every 60 seconds the BGP scanner recalculates best path for all prefixes

Changes to the IGP cost of a BGP nexthop will go unnoticed until scanner’s next run

IGP may converge in less than a second

BGP may not react for as long as 60 seconds 

Need to change from a polling model to an event driven model to improve convergence

Polling model – Check each BGP nexthop’s IGP cost every 60 seconds

Event driven model – BGP is informed by a 3rd party process when the IGP cost to a BGP nexthop changes

Atf address tracking filter
ATF – Address Tracking Filter

ATF is a middle man between the RIB and RIB clients

RIB clients: BGP, OSPF, EIGRP, etc.

ATF and client interaction

Client tells ATF to register a given IP address (ex: an IP address that is used as a BGP next-hop)

RIB notifies ATF of any route modification/creation/deletion

ATF notifies client if the lookup route associated to any registered IP address changes/switches/appears/disappears

Atf address tracking filter1
ATF – Address Tracking Filter

BGP tells ATF to let us know about any changes to and

Changes to and are passed along to BGP


BGP Nexthops


ATF filters out any changes for,, and


Nht next hop tracking
NHT – Next Hop Tracking

BGP Next Hop Tracking

Enabled by default

[no] bgp nexthop trigger enable

BGP registers all nexthops with ATF

Hidden command will let you see a list of nexthops

show ip bgp attr nexthop

ATF will let BGP know when a route change occurs (if of interest for a BGP nexthop)

ATF notification will trigger a “lightweight BGP Scanner” run

Bestpaths will be calculated

The rest of the other “Full Scan” work will NOT happen

Nht next hop tracking1
NHT – Next Hop Tracking

Once an ATF notification is received BGP waits 5 seconds (default) before triggering NHT scan

bgp nexthop trigger delay <0-100>

Configured value should be the maximum time it takes for the IGP to converge

Event driven model allows BGP to react quickly to IGP changes

No longer need to wait as long as 60 seconds for BGP to scan the table and recalculate bestpaths

Tuning your IGP for fast convergence is recommended

Nht next hop tracking2
NHT – Next Hop Tracking

Dampening is used to reduce frequency of triggered scans

It does not allow too frequent “lightweight BGP scanner”

show ip bgp internal

Displays data on when the last NHT scan occurred

Time until the next NHT may occur

New commands

bgp nexthop trigger enable

bgp nexthop trigger delay <0-100>

show ip bgp attr next-hop ribfilter

debug ip bgp events nexthop

debug ip bgp rib-filter

Fast external fallover
Fast External Fallover

Objective: Tear down the session if the interface to reach the peering address goes down

No need to wait for the hold timer to expire!

When does it work?

Only when peering address is directly connected

Only for eBGP peers

ebgp-multihop OR disable-connected-check can NOT be configured


ON by default

Under router bgp:

[no] bgp fast-external-fallover

Under interface (priority over router configuration):

[no] ip bgp fast-external-fallover {permit|deny}

Recommended if interface goes down during failure

Fast session deactivation fsd
Fast Session Deactivation (FSD)

Objective: Tear down the session if the route to reach the peering address disappears

No need to wait for the hold timer to expire!

How does it work?

BGP registering peering address with ATF (similar to NHT)

It’s triggered immediately (trigger-delay = 0 and cannot be configured)


OFF by default

Under router bgp: neighbor <neighbor-ip> fall-over

Recommended for multihop eBGP peers known via IGP

Very dangerous for iBGP peers

If we loose the route for a split second, we bring the peer down!

iBGP sessions usually re-route!

Scalability update overview
Scalability Update – Overview

Bootup convergence (or convergence after “clear ip bgp *”) are the biggest challenges

Must receive updates from all peers

Must compute all best-paths

Must format updates for all peers

Must transmit updates for all peers

To improve the process:

Make sure that you don’t start computing best-paths till you have received updates from all peers

All the peers will send you a KA or a EOR when they have finished sending you the updates

Maximum timer: bgp update-delay <1-3600> (default 120)

Increase it if your network takes lot of time to converge

Depends on number of routes, number of peers and on specific platform

Nsr non stop routing
NSR – Non Stop Routing

NSR and NSF (Non Stop Forwarding) are not the same

Both provide for a restarting speaker to continue forwarding

Usually, FIB is distributed and not affected while the main RP is restarting

NSF in a nutshell

Needs support of (NSF aware) peers

Peers are aware that restarting speaker keeps forwarding while restarting and don’t delete the routes towards him.

BGP extensions required: GR (Graceful Restart)

Not a challenge within an AS

PE  CE is a problem

Upgrading CEs is a huge deployment challenge

Nsr non stop routing1
NSR – Non Stop Routing

NSR in a nutshell

Provides forwarding and preserves routing during Active RP failover to Standby RP

It’s called a SSO (stateful switchover)

BGP peers’ TCP sessions are maintained

BGP extensions: NOT required

CEs do not need to be upgraded!

Nsr non stop routing2
NSR – Non Stop Routing

Deployment challenges:

NSF is easy to implement inside an AS

All the routers can be upgraded to support GR

Problem are CEs (upgrading to support GR can be a huge deployment challenge)

NSR is easier to implement

No need to upgrade CEs

PE uses NSR with CEs that are not NSF-aware

PE uses NSF with NSF-aware CEs

PE uses NSF with RRs (NSF-aware)

Nsr non stop routing3
NSR – Non Stop Routing

Simplified deployment for service providers

Only PEs need to be upgraded to support NSR (incremental deployment)

CEs are not touched! (i.e., no software upgrade required)

Nsr related commands
NSR – Related Commands

show ip bgp vpnv4 all sso summary

Used to display the number of BGP peers that support Cisco BGP NSR

Router# show ip bgp vpnv4 all sso summary

Stateful switchover support enabled for 40 neighbors

Route flap dampening
Route Flap Dampening

Defined in RFC 2439

Route flap: The bouncing of a path or a change in its characteristics

A flap ripples through the entire Internet

Consumes CPU cycles, causes instability

Solution: Reduce scope of route flap propagation

Suppress oscillating routes (history predicts future behavior)

Only eBGP routes are dampened

Route flap dampening1
Route Flap Dampening

Flap: every time we receive a withdrawn or change of attributes for a given route

Withdrawn: we increase the penalty by 1000

Change of attributes: we increase the penalty by 500

To suppress (dampen a route):

Penalty accumulated must be greater than the suppress-limit

To reuse a route (“undampen” a route):

Penalty decreases exponentially

When it reaches reuse-limit, we use it again

Route flap dampening2
Route Flap Dampening


Suppress limit




Reuse limit



































Not Announced


Route flap dampening3
Route Flap Dampening


Basically reduces CPU hit load

Does not propagate local flaps to the whole internet

Troubleshooting PLUS: Makes all these local flaps (routes that have been suppressed) visible

Route flap dampening4
Route Flap Dampening

Guidelines RIPE-229:

“Progressive” dampening: more aggressive for longer prefixes

Needs to be coordinated

Some parameters recommended

“golden” networks (like gTLD name servers) should not be dampened

Apply as close as possible to the prefix being advertised

Peering, upstream, customer boundaries

No need to dampen routes from customers that use Provider Aggregated addresses

Route flap dampening5
Route Flap Dampening

Guidelines RIPE-378:

Internet today:

A single “normal” withdraw/update can propagate as many withdraws/updates a few hops away

Route dampening would maintain this prefixes unreachable unnecessarily

Routers today:

Power makes them more tolerant to route flapping


Do NOT implement route dampening

Deployment and operation of bgp

AS 1

AS 3









10.201-249 /16

AS 4








Complete your online session evaluation
Complete Your Online Session Evaluation

  • Give us your feedback and you could win fabulous prizes. Winners announced daily.

  • Receive 20 Cisco Preferred Access points for each session evaluation you complete.

  • Complete your session evaluation online now (open a browser through our wireless network to access our portal) or visit one of the Internet stations throughout the Convention Center.

Don’t forget to activate your Cisco Live and Networkers Virtual account for access to all session materials, communities, and on-demand and live activities throughout the year. Activate your account at any internet station or visit www.ciscolivevirtual.com.

Enter to win a 12 book library of your choice from cisco press
Enter to Win a 12-Book Libraryof Your Choice from Cisco Press

Visit the Cisco Store in the World of Solutions, where you will be asked to enter this Session ID code

Check the Recommended Reading brochure for suggested products available at the Cisco Store