protocol implementation l.
Skip this Video
Loading SlideShow in 5 Seconds..
Protocol implementation PowerPoint Presentation
Download Presentation
Protocol implementation

Loading in 2 Seconds...

play fullscreen
1 / 20

Protocol implementation - PowerPoint PPT Presentation

  • Uploaded on

Protocol implementation. Next-hop resolution Reliability and graceful restart . What is a next-hop. The destination of the packets I am sending Not the same as the interface An ethernet interface will have many nodes behind it Directly connected next hop is 1 hop away

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Protocol implementation' - Renfred

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
protocol implementation
Protocol implementation
  • Next-hop resolution
  • Reliability and graceful restart
what is a next hop
What is a next-hop
  • The destination of the packets I am sending
    • Not the same as the interface
    • An ethernet interface will have many nodes behind it
    • Directly connected next hop is 1 hop away
  • E.g. RSVP sends a PATH message to the next downstream node
    • Next hop may be directly connected (strict ERO)
    • Or not (loose ERO)
  • OSPF sends an LS update to the other end of a link or a neighbor on an eithernet
    • Always directly connected
  • BGP has an iBGP-next hop for each of its paths
    • Not directly connected
next hop
  • If the next hop is not directly connected the way to reach it depends on the IGP
    • May change when IGP routing changes
    • Will have to use a different interface to reach it
    • Need to keep track of these changes
      • Next hop resolution
next hop resolution
Next hop resolution
  • Periodic resolution
    • may take a bit more time
      • But next-hops will not be too many
      • Or will they? Tunnels, VLANs …
    • Quagga uses this approach
      • Through the IPV4_LOOKUP_NEXTHOP command
  • Registration/notification
    • RSVP would tell zebra which nexthops it is interested in
    • Zebra will notify RSVP when something changes in the IGP path to it
      • Better scaling for RSVP
      • Difficult to ensure good scaling inside zebra
        • Various protocols may register 1000s of next hops
      • More complex code in zebra
network reliability
Network Reliability
  • Availability: How many nines?
    • 99.999% is 5.26 min down time/year
    • 99.9999% is 31.5 sec down time/year
  • Telephone networks are between 5 and 6 nines
    • Internet will have to get there
    • Currently at 4 nines? (vendors claim 5)
    • Very important with the new types of traffic
      • Voip, Ipvt
  • What can go wrong

(% of failures for US telephone network ca. 1992):

    • Hardware failures (19%)
    • Software failures (14%)
    • Human errors (49%)
    • Vandalism/Terrorism
    • Acts of nature (11%)
    • Overload (6% but had the largest impact on customers)
hardware failures
Hardware failures
  • Link failures
    • Protocols can cope with that
      • Re-route, may be slow
      • More aggressive repair methods
        • we will see them later
  • Router failures
    • Can not do much just add redundancy
      • Power supplies, fans, disks, etc
    • Line-card failure is similar to a link failure
    • Control processor failure is more serious
      • Always have two of them
      • Primary and backup
modern router architectures
Modern Router architectures
  • Dual controllers
    • For running the control plane
  • Multiple line-cards
    • Can operate without the controllers
    • Router can forward traffic even when the control plane crashes
    • Called non-stop forwarding or head-less operation
software failures
Software failures
  • When primary fails start using backup
    • Switchover
  • Must be as fast as possible
    • Things in the network change in the meanwhile
    • Need to minimize this window
  • What happens with the control software
    • Need to keep primary and backup instance in sync
    • How tight is this synchronization?
tight synchronization
Tight synchronization
  • Both primary and backup are active, keep them in sync by:
  • Send them both the same input (I.e. duplicate control packets)
    • Fastest possible switchover
    • Expensive, may need to duplicate packets
    • Does not work for TCP based protocols
  • The primary keeps sending state updates to the backup
    • May need to send too many messages
  • Being totally in-sync is not easy
    • Needs transactional communication
loose synchronization
Loose synchronization
  • Backup is idle
    • But we keep configuration up to date
    • Each configuration change on the primary is mirrored on the backup
  • Backup instance is started when the primary fails
    • Switchover will take longer
  • Much-much simpler
    • Configuration changes are much less
  • Variation:
    • Keep only the RIB process in sync in both primary and backup
non stop forwarding
Non-stop forwarding
  • Key concept
    • forwarding happens in the line cards
    • Even if control processor fails forwarding can continue
    • Non stop forwarding, head-less operation
  • Old Common sense: when router s/w crashes do not use the router
    • But with head-less operation it is ok to continue using routers that their s/w crashed
    • Assuming their s/w will be operational again soon
special case
Special Case
  • Planned restart
    • For s/w upgrade
      • These are a significant percentage of downtime
    • For refresh
      • Memory is leaking but s/w still operational
      • Restart to get a clean start
  • I can use graceful restart
graceful restart
Graceful Restart
  • Other routers in the network will keep using a neighbor router
    • Even if is looks like its control plane has failed
    • Assuming it will come back soon
  • Needs coordination
    • The failed router needs to do some special processing when it comes back
    • It has to tell its neighbors first that it supports graceful restart
  • Zero impact on the network
    • The failed router will have the chance to restart its s/w and come back
    • Nobody in the rest of the network will know that something happened
how does it work
How does it work
  • Used for all protocols by now
  • The neighbor will discover that the router is dead or it has restarted
    • HELLO timeout, different information in the HELLOs etc…
    • But will ignore it for a certain time period
  • If the failed router comes back within this period
    • It will re-sync its state (database exchange for OSPF, resend all the LSPs for RSVP, …)
    • And all is back to normal
example rsvp
Example RSVP
  • Use HELLOs
  • Special recovery label messages
  • Restarting router needs to remember the labels it allocated before the crash
    • Where?
      • Shared memory
      • recover them from the forwarding plane
    • Why?
      • Must use the same labels again
      • Must make sure it does not use an allocated label for some other LSP
example ospf
Example OSPF
  • Trick is to re-establish the adjacencies after a failure
  • Remember the set of neighbors
    • Shared memory or in the backup controller
  • After restart do not originate any LSAs
  • Just re-establish adjacencies and re-sync database
graceful restart catches
Graceful restart catches
  • All routers in the network should implement this to work
  • Mostly for planned restarts:
    • S/w upgrades
    • Refreshes (if a router runs low on memory)
    • But it is possible to use for crashes too!
  • It can not work if something changes in the network while the restart is going on
    • There may be routing loops
router self monitoring
Router self-monitoring
  • Automatically restart failed or stuck processes
  • A separate monitor process
    • Keeps an eye on other processes
    • If there is a failure the failed process is restarted
      • Of course it may fail again
    • Heart-beats to determine liveness
    • Failure may not necessarily be a crash
      • Could be a software bug that causes an infinite loop or very-very slow processing
why is it important
Why is it important
  • Remember the PoP structure
    • Need dual routers for reliability
    • If I had a single router that was extra-reliable I could save a lot of money
  • Strict Isolation
    • VMs
    • Other methods
  • Global resource coordination
    • For example memory