Management of routing protocols in ip networks
Download
1 / 86

Management of Routing Protocols in IP Networks - PowerPoint PPT Presentation


  • 148 Views
  • Updated On :

Management of Routing Protocols in IP Networks . Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003. Introduction. Internet connects millions of computers Internet is packet-switched: Each packet travels independently of the rest Routers provide connectivity

Related searches for Management of Routing Protocols in IP Networks

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Management of Routing Protocols in IP Networks' - eileen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Management of routing protocols in ip networks l.jpg

Management of Routing Protocols in IP Networks

Ph.D. Defense

Aman Shaikh

Computer Engineering, UCSC

November 18, 2003

Ph.D. Defense


Introduction l.jpg
Introduction

  • Internet connects millions of computers

    • Internet is packet-switched:

      • Each packet travels independently of the rest

  • Routers provide connectivity

    • Routers forward packets so that they reach their ultimate destination

  • Forwarding is destination-based and hop-by-hop

    • Router decides next-hop (i.e., neighbor router) for each packet based on its destination address

  • Routing protocols allow routers to determine next-hop(s) for every destination

Ph.D. Defense


Management of routing infrastructure l.jpg
Management of Routing Infrastructure

  • Management of routing infrastructure is a nightmare

    • “Simple core (= routing infrastructure), smart edge (= end hosts)” design paradigm

      • Internet only provides a best-effort, connectionless, unreliable service

      • Routing is not designed with manageability in mind

    • Large distributed system

      • Hundreds of routers and thousands of links in big service provider networks

      • Variety of routing protocols

    • The infrastructure is evolving

      • New services require new protocols and devices

Ph.D. Defense


Dissertation contribution l.jpg
Dissertation Contribution

  • Focuses on management of Open Shortest Path First (OSPF) protocol

    • OSPF is widely used to control routing within service provider and enterprise networks

  • Three areas of focus

    • Monitoring

    • Characterization

    • Maintenance

Ph.D. Defense


Monitoring l.jpg
Monitoring

  • Motivation:

    • Effective management requires sound monitoring systems

  • Contribution:

    • Design and implementation of an OSPF monitor

    • Deployment in two commercial networks

      • Has proved valuable for trouble-shooting and identifying impending problems in early stage

      • Collection and archiving of OSPF data that is used for performance improvement, post-mortem analysis and further research

Ph.D. Defense


Characterization l.jpg
Characterization

  • Motivation:

    • Need sound simulation and analytical models for scalability studies, addition of new features etc...

      • How do we parameterize these models?

    • Need vendor-independent benchmarking methods

  • Contribution:

    • Black-box techniques for estimating OSPF processing delays within a router

      • Has become basis for OSPF benchmarking standardization efforts

    • Case study of OSPF dynamics in an enterprise network

Ph.D. Defense


Maintenance l.jpg
Maintenance

  • Motivation:

    • Maintenance of routers occurs fairly frequently

      • Protocol enhancements, bug fixes, hardware/software upgrades

    • During maintenance, operators have to withdraw router undergoing maintenance

      • Leads to route flapping and instability

    • How to perform seamless maintenance?

  • Contribution:

    • I’ll Be Back (IBB) capability for OSPF

      • Allows “router-under-maintenance” to be used for forwarding

Ph.D. Defense


Outline l.jpg
Outline

  • Background

    • Routing and OSPF overview

    • Design of an IP router

  • Monitoring

    • OSPF Monitor

  • Characterization

    • Black-box measurements for OSPF

    • Case study of OSPF dynamics

  • Maintenance

    • I’ll Be Back (IBB) Capability for OSPF

  • Conclusions and future work

Ph.D. Defense


Routing in the internet l.jpg
Routing in the Internet

AS1

AS2

BGP

OSPF

IS-IS

BGP

BGP

BGP

BGP

AS3

AS4

AS5

BGP

BGP

RIP

OSPF

OSPF

  • Internet is a collection of Autonomous Systems (ASes)

  • Two classes of routing protocols

    • IGP (Interior Gateway Protocols)

      • Used within an AS

      • Example: OSPF, IS-IS, RIP, EIGRP

    • EGP (Exterior Gateway Protocols)

      • Used across ASes

      • Example: BGP

Ph.D. Defense


Overview of ospf l.jpg
Overview of OSPF

  • OSPF is a link-state protocol

    • Every router learns entire network topology

      • Topology is represented as graph

        • Routers are vertices, links are edges

        • Every link is assigned weight through configuration

    • Every router uses Dijkstra’s single source shortest path algorithm to build its forwarding table

      • Router builds Shortest Path Tree (SPT) with itself as root

      • Shortest Path Calculation (SPF)

    • Packets are forwarded along shortest paths defined by link weights

Ph.D. Defense


Areas in ospf l.jpg

Border routers

Area 1

Area 2

Area 0

Areas in OSPF

  • OSPF allows domain to be divided into areas for scalability

    • Areas are numbered 0, 1, 2 …

    • Hub-and-spoke with area 0 as hub

    • Every link is assigned to exactly one area

    • Routers with links in multiple areas are called border routers

Ph.D. Defense


Summarization with areas l.jpg

OSPF domain

R1’s View

R1

R1

Area 0

Area 0

200

100

200

100

R2

R3

R2

R3

400

500

400

500

300

200

300

200

B1

B2

B1

B2

20

10

C1

C2

60

70

20

10

50

10.10.4.0/24

10.10.5.0/24

10.10.5.0/24

10.10.4.0/24

Area 1

Area 1

Summarization with Areas

  • Each router learns

    • Entire topology of its attached areas

    • Information about subnets in remote areas and their distance from the border routers

      • Distance = sum of link costs from border router to subnet

Ph.D. Defense


Link state advertisements lsas l.jpg
Link State Advertisements (LSAs)

  • Every router describes its local connectivity in Link State Advertisements (LSAs)

  • Router originates an LSA due to…

    • Change in network topology

      • Example: link goes down or comes up

    • Periodic soft-state refresh

      • Recommended value of interval is 30 minutes

  • LSA is flooded to other routers in the domain

    • Flooding is reliable and hop-by-hop

    • Includes change and refresh LSAs

    • Flooding leads to duplicate copies of LSAs being received

  • Every router stores LSAs (self-originated + received) in link-state database (= topology graph)

Ph.D. Defense


Adjacency l.jpg
Adjacency

  • Neighbor routers (i.e., routers connected by a physical link) form an adjacency

  • The purpose is to make sure

    • Link is operational and routers can communicate with each other

    • Neighbor routers have consistent view of network topology

      • To avoid loops and black holes

  • Link gets used for data forwarding only after adjacency is established

  • Use of periodic Hellos to monitor the status of link and adjacency

Ph.D. Defense


Design of an ip router l.jpg

Data packet

Forwarding

Forwarding

Data packet

Interface card

Interface card

Design of an IP Router

Route Processor (CPU)

OSPF Process

Routing calculation

BGP Process

Routing calculation

RIP Process

Routing calculation

Route Manager

Control Plane

Data Plane

Forwarding Info. Base (FIB)

Switching

Fabric

Ph.D. Defense


Outline16 l.jpg
Outline

  • Background

  • Monitoring

    • Motivation:

      • Effective management requires sound monitoring systems

    • Contribution: OSPF monitor

      • Design

        • Three component and their functionality

      • Deployment in two commercial networks

        • How OSPF Monitor is being used

        • Lessons learnt through deployment

  • Characterization

  • Maintenance

  • Conclusions and future work

Ph.D. Defense


Ospf monitor objectives l.jpg
OSPF Monitor: Objectives

  • Real-time analysis of OSPF behavior

    • Trouble-shooting, alerting

    • Real-time snapshots of OSPF network topology

  • Off-line analysis

    • Post-mortem analysis of recurring problems

    • Identify anomaly signatures and use them to predict impending problems

    • Allow operators to tune configurable parameters

    • Improve maintenance procedures

    • Analyze OSPF behavior in commercial networks

Ph.D. Defense


Related work l.jpg
Related Work

  • Route monitoring

    • Commercial IP monitors

      • Route Dynamics (IPSUM), Route Explorer (PacketDesign)

    • IPMON project at Sprint

      • IS-IS and BGP listeners

    • RouteViews and RIPE

      • Collects BGP updates from several networks

  • Topology tracking

    • OSPF topology server [shaikh:jsac02]

      • Evaluation and comparison of LSA-based versus SNMP-based approaches

    • Rocketfuel project at UW Seattle

      • Inference of intra-domain topologies from end-to-end measurements

Ph.D. Defense


Components l.jpg
Components

  • Data collection: LSA Reflector (LSAR)

    • Passively collects OSPF LSAs from network

    • “Reflects” streams of LSAs to LSAG

    • Archives LSAs for analysis by OSPFScan

  • Real-time analysis: LSA aGgregator (LSAG)

    • Monitors network for topology changes, LSA storms, node flaps and anomalies

  • Off-line analysis: OSPFScan

    • Tools for analysis of LSA archives

      • Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics

Ph.D. Defense


Example l.jpg

LSAG

Real-time Monitoring

OSPFScan

Off-line Analysis

LSA archive

LSA archive

LSA archive

Example

LSAs

LSAs

LSAs

LSAR 1

LSAR 2

“Reflect” LSA

“Reflect” LSA

replicate

LSAs

LSAs

LSAs

OSPF Network

Area 0

Area 2

Area 1

Ph.D. Defense


How lsar attaches to network l.jpg
How LSAR attaches to Network

  • Host mode

    • Join multicast group

    • Adv: completely passive

    • Disadv: not reliable, delayed initialization of LSDB

  • Full adjacency mode

    • Form full adjacency with a router

    • Adv: reliable, immediate initialization of LSDB

    • Disadv: LSAR’s instability can impact entire network

  • Partial adjacency mode

    • Keep adjacency in a state that allows LSAR to receive LSAs, but does not allow data forwarding over link

    • Adv: reliable, LSAR’s instability does not impact entire network, immediate initialization of LSDB

    • Disadv: can raise alarms on the router

Ph.D. Defense


Lsa agregator lsag l.jpg
LSA aGregator (LSAG)

  • Analyzes “reflected” LSAs from LSARs over TCP connections in real-time

  • Generates console messages:

    • Changes in OSPF network topology

      • ADJACENY COST CHANGE: rtr 10.0.0.1 (intf 10.0.0.2)  rtr 10.0.0.5 old_cost 1000 new_cost 50000 area 0.0.0.0

    • Node flaps

      • RTR FLAP: rtr 10.0.0.12 no_flaps 7 flap_window 570 sec

    • LSA storms

      • LSA STORM: lstype 3 lsid 10.1.0.0 advrt 10.0.0.3 area 0.0.0.0 no_lsas 7 storm_window 470 sec

    • Anomalous behavior

      • TYPE-3 ROUTE FROM NON-BORDER RTR: ntw 10.3.0.0/24 rtr 10.0.0.6 area 0.0.0.0

Ph.D. Defense


Ospfscan l.jpg
OSPFScan

  • Tools for off-line analysis of LSA archives

    • Parse, select (based on queries), and analyze

  • Derivation and analysis of auxiliary information from LSA archives

    • LSAs indicating network topology changes

    • Routing table entries

      • How OSPF routing tables evolved in response to network changes

      • How end-to-end path within OSPF domain looked like at any instance

    • Topology changes as graph-based abstraction

      • Vertex addition/deletion and link addition/deletion/change_weight

  • Playback of topology change events

    • Essentially an LSAG playback

Ph.D. Defense


Deployment l.jpg
Deployment

  • Deployed in two commercial networks

    • Enterprise network

      • 15 areas, 500+ routers; Ethernet-based LANs

      • Deployed since February, 2002

      • LSA archive size: 10 MB/day

      • LSAR connection: host mode

    • ISP network

      • Area 0, 100+ routers; Point-to-point links

      • Deployed since January, 2003

      • LSA archive size: 8 MB/day

      • LSAR connection: partial adjacency mode

Ph.D. Defense


Lsag in day to day operations l.jpg
LSAG in Day-to-day Operations

  • Generation of alarms by feeding messages into higher layer network management systems

    • Correlation and grouping of messages into a single alarm

    • Prioritization of messages

  • Validation of maintenance steps and monitoring the impact of these steps on network-wide OSPF behavior

    • Example:

      • Operators change link weights to carry out maintenance activities

      • A “link-audit” web-page allows operators to keep track of link weights in real-time

Ph.D. Defense


Problems caught by lsag l.jpg
Problems Caught by LSAG

  • Equipment problem

    • Detected internal problems in a crucial router in enterprise network

      • Problem manifested as episodes of OSPF adjacency flapping

  • Configuration problem

    • Identified assignment of same router-ids to two routers in enterprise network

  • OSPF implementation bug

    • Caught a bug in refresh algorithm of routers from a particular vendor in ISP network

      • Bug resulted in a much faster refresh of LSAs than standards-mandated rate

Ph.D. Defense


Long term analysis by ospfscan l.jpg
Long Term Analysis by OSPFScan

  • LSA traffic analysis

    • Identified excessive duplicate LSA traffic in some areas of the enterprise network

      • Led to root-cause analysis and preventative steps

  • Generation of statistics

    • Inter-arrival time of change LSAs in the ISP network

      • Fine-tuning configurable timers related to SPF calculation

    • Mean down-time and up-time for links and routers in the ISP network

      • Assessment of reliability and availability as ISP network gears for deployment of new services

Ph.D. Defense


Lessons learnt through deployment l.jpg
Lessons Learnt through Deployment

  • New tools reveal new failure modes

  • Real networks exhibit significant activity

    • Maintenance and genuine problems

  • Archive all LSAs

    • LSA volume is manageable

  • Stability and reliability of monitor is extremely important

  • Keep data collection separate from its analysis

    • Keep data collector as simple as possible

  • Add functionality incrementally and through interaction with users

Ph.D. Defense


Summary l.jpg
Summary

  • Three component architecture

    • LSAR: LSA capture from the network

    • LSAG: real-time analysis of LSA stream

      • Detection and trouble-shooting of problems

    • OSPFScan: off-line analysis tools for LSA archives

      • Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics

  • Deployed in two commercial networks

    • Has proven a valuable network management tool

    • “OSPF Monitor was a lifesaver”

      • VP of Networking, Enterprise network 

        • When monitor caught an impending failure in an early stage

Ph.D. Defense


Outline30 l.jpg
Outline

  • Background

  • Monitoring

  • Characterization

    • Motivation:

      • Simulation and analytical models, benchmarking

    • Contributions:

      • Black-box techniques for estimating OSPF processing delays on a router

        • Tasks we measure, methodology, results for Cisco and GateD

      • Case study of OSPF dynamics in an enterprise network

  • Maintenance

  • Conclusions and future work

Ph.D. Defense


Black box measurements for ospf l.jpg
Black-box Measurements for OSPF

  • OSPF processing delays within a router matter!

    • Add up to impact convergence and stability

    • Guidance in tuning configurable parameters, head to head vendor comparisons, simulation models

  • Instrumenting routing code for measuring delays is challenging

    • Commercial implementations are proprietary

    • May involve grappling with

      • Numerous code versions, hardware platforms, and developers

  • Use black-box measurements

    • Measure the timing delays using external observations

    • Applied to Cisco and GateD OSPF implementations

Ph.D. Defense


Related work32 l.jpg
Related Work

  • White-box measurements for IS-IS [alaettinoglu]

    • SPF delays reported are comparable to results obtained by us

  • Empirical analysis of router behavior under large BGP routing tables [chang:imw02]

    • Cisco and Juniper routers

  • Benchmarking Methodology working group (bmwg) at IETF

    • Drafts related to OSPF benchmarking

      • Our black-box methods are basis for some benchmark tests

Ph.D. Defense


What tasks did we measure l.jpg

SPF Calculation

LSA

LSA

LS Ack

Data packet

What tasks did we measure?

LSA Processing

Route Processor (CPU)

OSPF Process

LSA Flooding

Topology

View

SPF Calculation

FIB Update

FIB

Forwarding

Forwarding

Switching

Fabric

Interface card

Interface card

Ph.D. Defense


Methodology l.jpg

Emulated topology

LSA

LSA

LSA

Methodology

Target router

TopTracker

Testbed

  • Load emulated topology on target router

  • Initiate task of interest

  • Measure the time for task

Ph.D. Defense


Measuring task time l.jpg

B

time

A

X

C

Measuring Task Time

  • Use a black-box method to bracket task start and finish times

  • Subtract out intervals that precede and exceed these times

top bracket event

task start time

task finish time

bottom bracket event

X = A - (B + C)

Ph.D. Defense


Measuring spf calculation l.jpg

Load desired topology

TopTracker

Target Router

Send initiatorLSA

B

C

Send duplicate LSA

A

X

E

D

Send ack for duplicate LSA

Measuring SPF Calculation

Initiator LSA arrives

SPF calculation starts

time

SPF calculation ends

Ack for duplicate LSA arrives

  • X = A – (B + C + D + E)

  • Estimate the overhead = B + C + D + E

Ph.D. Defense


Estimating the overhead l.jpg

TopTracker

Target Router

B

Send initiator LSA

Send duplicate LSA

C

overhead

D

E

Duplicate LSA processing done;

send ack

Estimating the Overhead

  • Remove SPF calculation from bracket

    • spf_delay = 60 seconds

Initiator LSA arrives

Duplicate LSA arrives

time

Initiator LSA processing done

Ack for duplicate LSA arrives

SPF calculation starts

overhead = B + C + D + E

Ph.D. Defense


Results l.jpg
Results

  • Results for Cisco GSR, 7513 and GateD

    • For GateD, comparison of black-box results with those obtained using instrumentation (white-box)

    • Route processors

      • Cisco: 200 MHz R5000 processor

      • GateD: 500 MHz AMD-K6 processor

  • Topology: full nn mesh with random OSPF edge weights

    • n in range 10, 20, …, 100

Ph.D. Defense


Results for cisco routers l.jpg
Results for Cisco Routers

  • Observations

    • Similar results for two models

    • SPF calculation time is O(n2)

Ph.D. Defense


Results for gated l.jpg
Results for GateD

  • Observations:

    • Black-box over-estimates white-box measurement

    • Black-box captures the characteristics very well

Ph.D. Defense


Summary41 l.jpg
Summary

  • Black-box methods for estimating OSPF processing delays

    • Work across wide range of time delays

    • Work for pure CPU bound tasks

    • Effective in capturing scaling

    • Match with white-box measurements

  • Applied methods to Cisco GSR and 7513

    • LSA Processing: 100-800 microseconds

    • LSA flooding: 30-40 milliseconds

      • Pacing timer is the determining factor

    • SPF calculation: 1-40 milliseconds

      • O(n2) behavior for full n x n mesh

    • FIB update time: 100-300 milliseconds

      • No dependence on topology size

Ph.D. Defense


Outline42 l.jpg
Outline

  • Background

  • Monitoring

  • Characterization

    • Motivation:

      • Simulation and analytical models, benchmarking

    • Contributions:

      • Black-box techniques for estimating OSPF processing delays on a router

      • Case study of OSPF dynamics in an enterprise network

        • Enterprise network topology, categorization of LSA traffic, results

  • Maintenance

  • Conclusions and future work

Ph.D. Defense


Case study of ospf dynamics l.jpg
Case Study of OSPF Dynamics

  • OSPF behavior in commercial networks is not well understood

  • Understanding dynamics of LSA traffic is key to better understanding of OSPF

    • Bulk of OSPF processing is due to LSAs

    • Big impact on OSPF convergence, (in)stability

  • Analysis of LSA archives collected by OSPF monitor in enterprise network

    • Focus on April, 2002 data

Ph.D. Defense


Related work44 l.jpg
Related Work

  • Several studies focusing on BGP dynamics in the Internet

    • Relatively easy to collect BGP data

    • BGP is more complicated

  • OSPF dynamics in a regional service provider network (MichNet) [watson:icdcs03]

    • One year worth of data

    • Several findings are similar to our observations

  • Analysis of OSPF stability through simulations [basu:sigcomm01]

Ph.D. Defense


Enterprise network l.jpg
Enterprise Network

  • Provides customers with connectivity to applications and databases residing in data center

  • OSPF network

    • 15 areas, 500 routers

      • This case study covers 8 areas, 250 routers

      • One month: April, 2002

    • Ethernet-based LANs

  • Customers are connected via leased lines

    • Customer routes are injected via EIGRP into OSPF

      • The routes are propagated via external LSAs

Ph.D. Defense


Enterprise network topology l.jpg

External

(EIGRP)

Area A

LAN1

LAN 2

B1

B2

Monitor

Border rtrs

Area 0

Enterprise Network Topology

Customer

Customer

Customer

EIGRP

EIGRP

EIGRP

OSPF

Domain

Area A

Area B

Area 0

Area C

Servers

Database Applications

Monitor uses host mode to

receive LSAs

Ph.D. Defense


Categorizing lsa traffic l.jpg
Categorizing LSA Traffic

  • Refresh LSA traffic

    • Originated due to periodic soft-state refresh

    • Forms base-line LSA traffic

    • Can be predicted using configuration information

  • Change LSA traffic

    • Originated due to changes in network topology

      • E.g, link goes down/comes up

    • Allows detection of anomalies and problems

  • Duplicate LSA traffic

    • Received due to redundancy in flooding

    • Overhead -- wastes resources

Ph.D. Defense


Lsa traffic in different areas l.jpg

Area 0

Area 2

Genuine Anomaly

Genuine Anomaly

Days

Days

Artifact: 23 hr day (Apr 7)

Days

Days

Area 3

Area 4

LSA Traffic in Different Areas

Refresh

LSAs

Change

LSAs

Duplicate

LSAs

Ph.D. Defense


Baseline lsa traffic refresh lsas l.jpg
Baseline LSA Traffic: Refresh LSAs

  • Refresh LSA traffic can be reliably predicted using router configuration files

    • Important for workload generation

Days

Days

Area 2

Area 3

Ph.D. Defense


Refresh process is not synchronized l.jpg
Refresh process is not synchronized

  • No evidence of synchronization

    • Contrary to simulation-based study [basu:sigcomm01]

  • Reasons

    • Changes in the topology help break synchronization

    • LSA refresh at one router is not coupled with LSA refresh at other routers

    • Drift in the refresh interval of different routers

Ph.D. Defense


Change lsas l.jpg
Change LSAs

Days

  • Internal to OSPF domain versus external

    • Change LSAs due to external events dominated

    • Not surprising due to large number of leased lines and import of customer routes into OSPF

      • Customer volatility  network volatility

Ph.D. Defense


Root causes of change lsas l.jpg
Root Causes of Change LSAs

  • Persistent problem  flapping  numerous change LSAs

    • Internal LSA spikes  hardware router problems

      • OSPF monitor identified a problem (not visible other network mgt tools) early and led to preventive maintenance

    • External LSA spikes  customer route volatility

      • Overload of an external link to a customer between 9 PM – 3 AM caused EIGRP session to flap

Link flaps

Ph.D. Defense


Overhead duplicate lsas l.jpg
Overhead: Duplicate LSAs

  • Why do some areas witness substantial duplicate LSA traffic, while other areas do not witness any?

    • OSPF flooding over LANs leads to control plane asymmetries and to imbalances in duplicate LSA traffic

Days

Ph.D. Defense


Summary54 l.jpg
Summary

  • Refresh LSAs: constituted bulk of overall LSA traffic

    • No evidence of synchronization between different routers

    • Refresh LSA traffic predictable from configuration information

  • Change LSAs: mostly indicated persistent yet partial failure modes

    • Internal LSA spikes  hardware router problems  preventive router maintenance

    • External LSA spikes  customer congestion problems  “preventive” customer care

  • Duplicate LSAs: arose from control plane asymmetries

    • Simple configuration changes could eliminate duplicate LSAs and improved performance

Ph.D. Defense


Outline55 l.jpg
Outline

  • Background

  • Monitoring

  • Characterization

  • Maintenance

    • Motivation:

      • Seamless maintenance and upgrades of routers

        • Minimal instability and flaps

    • Contribution:

      • I’ll Be Back (IBB) capability for OSPF

        • What IBB capability provides, how capability is implemented, performance analysis

  • Conclusions and future work

Ph.D. Defense


Maintenance is a pain l.jpg
Maintenance is a Pain

  • Maintenance of routers is a way of life in commercial networks

    • Extensions to routing protocols, new functionality, hardware and software upgrades, bug fixes

  • Maintenance is a painful exercise

    • During maintenance, operators withdraw “router-under-maintenance” from forwarding service

      • Leads to route flaps, traffic disruption and instability

    • Operators have to carefully schedule maintenance

      • Schedule them during night when load is moderate

      • Stagger maintenance of different routers across time

Ph.D. Defense


We can do better l.jpg
We can do better

  • Observation: router can continue forwarding even while its routing process is inactive, at least for a while

    • Current routers have separate routing and forwarding paths

      • Routing in software (CPU)

      • Forwarding in hardware (switching)

  • Need to extend routing protocols since they always try to route around inactive router

    • Our proposal: IBB (I’ll Be Back) extensions to OSPF

Ph.D. Defense


Ibb proposal in a nutshell l.jpg
IBB Proposal in a Nutshell

  • OSPF process on router R needs to be shutdown

  • Before shutdown, R informs other routers that

  • it is going to be inactive for a while

  • R specifies a time period (IBB Timeout) by which it

  • expects to become operational again

  • Other routers continue using R for forwarding during

  • IBB Timeout period

  • If R comes back within IBB Timeout period,

  • no routing instability or flaps

  • Else other routers start forwarding packets around R

Ph.D. Defense


Related work59 l.jpg
Related Work

  • Graceful restart proposals for various routing protocols at IETF

    • Graceful restart proposal for OSPF by John Moy

  • Alex zinin’s propsal to avoid flaps upon restart of OSPF process

    • Process has to come up before other routers notice it was shutdown

    • Provides small window of opportunity

  • Use of redundant route processors and seamless transfer of control

    • NSR (Avici), High Availability Initiative (Cisco)

Ph.D. Defense


What if topology changes l.jpg

A

A

10

3

6

6

B

R

B

R

2

2

(b) Topology changes while

R is inactive

(a) Topology when

R went down

What if topology changes

  • R cannot update its forwarding table to reflect the change

    • Can lead to loop or black holes

Ph.D. Defense


Handling changes three options l.jpg
Handling Changes: Three Options

  • Don’t do anything

  • Stop using R: John Moy’s proposal

    • Inadvertent changes during upgrade are likely

      • Example: flapping due to a bad interface somewhere

    • But all changes are not bad

      • Do not always lead to loops or black holes

  • Stop using R only when loop or black hole gets formed

    • And only for destinations for which there is a problem

Our approach

Ph.D. Defense


Roadmap of algorithm l.jpg
Roadmap of Algorithm

  • Single area, single inactive router case

    • Loop formation

    • Black hole formation

  • Single area, multiple inactive routers case

    • Loop formation

  • Multiple areas

    • Black hole formation and area partitions

Ph.D. Defense


Single area single inactive router l.jpg
Single Area, Single Inactive Router

  • Problem Formulation

    • Inactive Router = R

    • All routers other than R have the same image of the topology graph

    • R’s image is that of a past = the time at which it went down

    • Source = S, Destination = D

    • Next hop(R, D) = Y

    • Actual path a packet takes from S to D = P(SD)

Ph.D. Defense


Loop detection l.jpg

S

S

S

Y

1

1

1

2

20

20

R

R

R

R

2

6

2

6

6

2

6

1

Y

Y

D

D

D

D

Y

S

3

10

Topology changes

while R is inactive

S and Y have R on their paths

to D in their SPT

Topology when

R went down

Loop Detection

P(SD) has a loop

iff S and Y have R on their paths to D in their SPTs

If there is a loop, neighbor can always detect it

Ph.D. Defense


Loop prevention l.jpg

S

S

Y

1

20

10

20

D

D

R

2

6

Y

D

10

Changed topology

while R is inactive

S and Y calculate paths

to D w/o R on it

Loop Prevention

Every router needs to calculate a path to D

such that R does not appear on it

Ph.D. Defense


Loop avoidance procedure l.jpg
Loop Avoidance Procedure

  • Rsends forwarding table to neighbors before shutdown

    • -Thus, Y knows that next hop(R, D) is Y

  • Detection: during SPF calculation neighbors detect loops

    • -Y checks if R exists on the path to D or not

  • Upon detection, neighbors send avoid messages

  • to other routers in the domain

    • -avoid(R, D) = avoid using R for reaching D

  • Prevention: upon receiving avoid(R, D) message,

  • other routers calculate a new path to D without R on it

Ph.D. Defense


Performance l.jpg
Performance

  • Maximum effect on SPF calculation

    • Quantify overhead

    • Impact of topology size

  • Prototype Implementation

    • IBB extensions incorporated into GateD 4.0.7

Ph.D. Defense


Testbed setup l.jpg

SUT’s view of the Topology

Physical Topology

SUT

SUT

LAN

LAN

1

1

TopTracker

X

TopTracker

1

Router under

maintenance

R

20

SUT

1

LSAs

LSAs

LSAs

System Under Test

= where IBB overhead

is measured

M1

Complete graph

with n nodes

Emulated topology

Testbed Setup

Ph.D. Defense


Experiment sequence l.jpg

Time (mins)

GateD on SUT

IBB-GateD on SUT

Case B

inactive rtr, avoid it

T = 0

Bring R down

Bring R down in IBB mode

Case A

inactive rtr

Send avoid(R, Mj) messages to SUT

(1j n)

T = 4

mean SPF time in Case B

Overhead =

mean SPF time in Case A

T = 8

Bring R up

Bring R up

Experiment Sequence

Ph.D. Defense


Result l.jpg
Result

  • Overhead remains constant at roughly 2.0 as n increases

  • Sources of overhead:

    • Second SPF calculation

    • Graph in case B is larger than graph in case A

Ph.D. Defense


Summary71 l.jpg
Summary

  • IBB proposal: extend OSPF so that a router can be used for forwarding even while its OSPF process is inactive

  • Main contribution: algorithm that gracefully handles topology changes

    • Stops using the inactive router for a destination if using the router can lead to loops or black holes

    • Overhead of the algorithm is modest

      • Shows good scaling behavior in terms of topology size

Ph.D. Defense


Outline72 l.jpg
Outline

  • Background

  • Monitoring

  • Characterization

  • Maintenance

  • Conclusions and future work

Ph.D. Defense


Conclusions l.jpg
Conclusions

  • Monitoring

    • Design and implementation of an OSPF monitor

    • Deployment in two commercial networks

  • Characterization

    • Black-box techniques for estimating OSPF processing delays within a router

    • Case study of OSPF dynamics in enterprise network

  • Maintenance

    • I’ll Be Back (IBB) capability for OSPF that allows a “router-under-maintenance” to be used for forwarding

Ph.D. Defense


Future work l.jpg
Future Work

  • Three principal directions for future work

    • Application of this work to other routing protocols

      • IS-IS is very similar to OSPF

      • EIGRP, RIP and BGP bring their own set of challenges

        • Distance-vector nature of the protocols

        • BGP also brings scalability issues

    • Other areas related to routing and network management

      • Security, network design, configuration management, simulation & modeling

      • How performance of routing infrastructure affects user-perceived performance

    • More work in each of three focus areas

Ph.D. Defense


Future work for monitoring l.jpg
Future Work for Monitoring

  • Real-time analysis

    • More meaningful alerting

      • Correlation with other fault and performance data

      • Learn from past events

    • Prioritization of alerts

  • Off-line analysis

    • Correlation with other data sources

      • Work already underway: BGP, fault, performance

    • Identification of problem signatures and feeding them into real-time component for problem prediction

Ph.D. Defense


Future work for characterization l.jpg
Future Work for Characterization

  • Expand measurements to cover other router vendors and commercial networks

  • Use results to build simulation and analytical models

    • Validation of models

Ph.D. Defense


Future work for maintenance l.jpg
Future Work for Maintenance

  • Improvements to IBB scheme

    • Incremental deployment

    • Reduction in overhead

  • How to use IBB-like schemes in conjunction with other approaches

    • Routing software that can be upgraded without bringing the process down

    • Use of redundant route processors and seamless transfer of control

    • Scheduling maintenance task such that they have minimal impact

Ph.D. Defense


Holy grail l.jpg
Holy Grail

Networks that manage themselves!

Ph.D. Defense


Grill me l.jpg
Grill me ...

Probably your last chance… :-)

Q and A

Ph.D. Defense


Backups l.jpg
Backups

Ph.D. Defense


Partial adjacency for lsar l.jpg

LSAR

R

Please send me LSA L

Please send me LSA L

Please send me LSA L

I have LSA L

Partial Adjacency for LSAR

I need LSA L

from LSAR

Partial state

  • Router R does not advertise a link to LSAR

    • Routers (except R) not aware of the presence of LSAR

    • Does not trigger SPF calculations in network

    • LSAR’s going up/down does not impact the network

  • LSAR does not originate any LSAs

    • LSARR link not used for data forwarding

  • LSAR does not install any routes in forwarding table

Ph.D. Defense


Multiple inactive routers for ibb l.jpg
Multiple Inactive Routers for IBB

  • Loop Avoidance

    • Change in loop detection conditions

    • Simplification for loop prevention

  • No change in black-hole detection

Ph.D. Defense


Loop avoidance l.jpg
Loop Avoidance

  • Set of inactive routers: R1, R2, …, Rn

  • Loop avoidance procedure applies for each inactive router

    • Detection

      • Router detects loops for all its inactive neighbors

    • Prevention

      • A router can get avoid(Ri, D) messages for j inactive routers (j <= n)

      • The router avoids these j forbidden routers on its path to D

  • Problem: Set of forbidden routers can be different for different destinations

    • O(n) shortest path calculations

      • n = number of vertices

Ph.D. Defense


Simplification l.jpg
Simplification

  • Router avoids all inactive routers if it has some forbidden routers on its path to D

    • Calculate two SPTs:

      • SPT with all inactive routers on it

      • SPT w/o any inactive router on it

    • If the path to D does not contain any forbidden routers on it,

      • Pick next hop for D from the first SPT

    • Else,

      • Pick next hop for D from the second SPT

Ph.D. Defense


Multiple inactive routers loop detection l.jpg
Multiple Inactive Routers: Loop Detection

  • Loop detection condition for single inactive router cannot detect all loop when multiple routers are inactive

  • Two new conditions for loop detection by neighbors

    • Generalization of loop detection for single inactive router

  • Conditions can result in false positives

  • Evaluation using realistic OSPF topology graphs with two inactive routers

    • Using two conditions together eliminate most false positives (90% hit-rate), but not all...

Ph.D. Defense


Publications l.jpg
Publications

  • Aman Shaikh, Mukul Goyal, Albert Greenberg, Raju Rajan and K.K. Ramakrishnan, An OSPF Topology Server: Design and Evalution, IEEE J-SAC, 20(4), May 2002.

  • Aman Shaikh and Albert Greenberg, OSPF Monitoring: Architecture, Design, and Deployment Experience, submitted to NSDI, 2004.

  • Aman Shaikh and Albert Greenberg, Experience in Black-box OSPF Measurement, In Proc. ACM SIGCOMM IMW, pp. 113-125, November 2001

  • Aman Shaikh, Chris Isett, Albert Greenberg, Matthew Roughan and Joel Gottlieb, A Case Study of OSPF Behavior in a Large Enterprise Network, In Proc. ACM SIGCOMM IMW, pp. 217-230, November 2002.

  • Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of OSPF, In Proc. IEEE INFOCOM, June 2002.

  • Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of Multiple OSPF Routers, submitted to IEEE/ACM Transactions on Networking (ToN).

Ph.D. Defense


ad