enterprise network troubleshooting n.
Skip this Video
Loading SlideShow in 5 Seconds..
Enterprise Network Troubleshooting PowerPoint Presentation
Download Presentation
Enterprise Network Troubleshooting

Loading in 2 Seconds...

play fullscreen
1 / 13

Enterprise Network Troubleshooting - PowerPoint PPT Presentation

  • Uploaded on

Enterprise Network Troubleshooting. Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar). Three Disjoint Views of the Network. Error Checking and Deployment. Generation. Policy: The operator’s “wish list”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Enterprise Network Troubleshooting' - rendor

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
enterprise network troubleshooting

Enterprise Network Troubleshooting

Nick FeamsterGeorgia Tech(joint with Russ Clark, Yiyi Huang,Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)

three disjoint views of the network
Three Disjoint Views of the Network

Error Checking and Deployment


  • Policy: The operator’s “wish list”
  • Static: What the configurations say
  • Dynamic: The behavior that users witness




  • ping- traceroute- …
  • rancid/rcc- FIREMAN/Lumeta

Independent analyses!

a closer look
A Closer Look
  • Proactive analysis
    • Fault avoidance
    • Policy conformance
  • Reactive diagnosis
    • Correcting network faults
      • Detection
      • Localization
    • Active and passive measurements
    • Need user’s perspective
  • Two studies
  • Routing
  • Firewalls

Idea: These analyses should inform each other

catastrophic configuration faults
Catastrophic Configuration Faults

“…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.” -- news.com, April 25, 1997

“Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes.” -- wired.com, January 25, 2001

“WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue." -- cnn.com, October 3, 2002

"A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration).“

-- dslreports.com, February 23, 2004

case 1 network wide routing analysis






Case 1: Network-Wide Routing Analysis
  • Proactive routing configuration analysis
  • Idea: Analyze configuration before deployment

Many faults can be detected with static analysis.

operators find static analysis useful
Operators Find Static Analysis Useful

“That’s wicked!” -- Nicolas Strina, ip-man.net

“Thanks again for a great tool.” -- Paul Piecuch, IT Manager

“...good to finally see more coverage of routing as distributed programming. From my experience, the principles of software engineering eliminate a vast majority of errors.”

-- Joe Provo, rcn.com

“I find your approach useful, it is really not fun (but critical for the health of the network) to keep track of the inconsistencies among different routers…a configuration verifier like yours can give the operator a degree of confidence that the sky won't fall on his head real soon now.”

-- Arnaud Le Tallanter, clara.net

yes but surprises happen
Yes, but Surprises Happen!
  • Link failures
  • Node failures
  • Traffic volumes shift
  • Network devices “wedged”
  • Two problems
    • Detection
    • Localization
detection analyze routing dynamics
Detection: Analyze Routing Dynamics
  • Idea: Routers exhibit correlated behavior

Blips across signals may be more operationally interesting than any spike in one.

detection three types of events
Detection Three Types of Events
  • Single-router bursts
  • Correlated bursts
  • Multi-router bursts
  • Common
  • Commonly missed using thresholds
localization joint dynamic static
Localization: Joint Dynamic/Static
  • Which routers are “border routers” for that burst
  • Topological properties of routers in the burst

Proactive Analysis





Reactive Detection

case 2 firewalls
Case 2: Firewalls
  • Georgia Tech Campus Network
    • Research and Administrative Network
    • 180 buildings
    • 130+ firewalls
    • 1700+ switches
    • 55000+ ports
  • Problem: Availability/Reachability
    • Flux in firewall, router, switch configurations
    • No common authority over changes made
specific focus firewall configuration
Specific Focus: Firewall Configuration
  • Difficult to understand and audit configs
  • Subject to continual modifications
    • Roughly 1-2 touches per day
  • Federated policy, distributed dependencies
    • Each department has independent policies
    • Local changes may affect global behavior
immediate open issues
(Immediate) Open Issues
  • Reachability and reliability of controller
  • Service-level probes
    • Diagnostic tools != Service-level “Happiness”
  • Policy conformance