1 / 18

Troubleshooting Mesh Networks

Troubleshooting Mesh Networks. Lili Qiu Joint Work with Victor Bahl, Ananth Rao, Lidong Zhou Microsoft Research. Mesh Networking Summit 2004. Motivation. Why is it so slow? Cordless phone interference? Neighbors drop traffic? MAC misbehavior? Too much user traffic? Routing problems?

dhawes
Download Presentation

Troubleshooting Mesh Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Troubleshooting Mesh Networks Lili Qiu Joint Work withVictor Bahl, Ananth Rao, Lidong Zhou Microsoft Research Mesh Networking Summit 2004

  2. Motivation Why is it so slow? Cordless phone interference? Neighbors drop traffic?MAC misbehavior? Too much user traffic? Routing problems? TCP problems? … Internet

  3. Research Challenges Just knowing link statistics is insufficient Complicated interactions • Between different network elements • Between different network protocols • Between different faults • Signature-based schemes may not capture all the interactions Need to apply to a wide range of networks Multi-hop wireless networks • Unpredictable physical medium and dynamic topology • Limited resources • Scale to hundreds of nodes

  4. Our Approach Framework: online trace-driven simulation • Create a real network inside a simulator • Identify root cause by searching for the faults that reproduce the same faulty symptom Advantages • Applicable to a large class of networks • Capture complicated interactions • Extensible to diagnose new faults • Facilitate what-if analysis

  5. Troubleshooting Framework FaultDiagnosis MeasuredPerformance Root Causes Raw Data SimulatedPerformance CandidateFaults Data Collection DataCleaning Trace-DrivenSimulation Routes Link Loads Root cause analysis module

  6. Common Concerns and Our Approaches for Simulation-Based Diagnosis • Simulation accuracy - Trace-driven simulation - Remove erroneous data from the trace 2. Too expensive to simulate - Advances in network simulator - Focus on long-term faults - Compression, spatial scoping, adaptive monitoring, multicast 3. Too large fault space - Develop an efficient search heuristic

  7. Simulator Accuracy: Good RF

  8. Simulator Accuracy: Poor RF

  9. Data Gathering What data to collect? • Network topology • Traffic statistics • Physical medium • Link performance Data sources: SNMP, WRAPI, Packet sniffers, NativeWiFi Dealing with Imperfect Data • Neighbor monitoring • Using history information • Find the smallest number of misbehaving nodes to explain inconsistency in traffic reports

  10. Root Cause Analysis

  11. Fault Diagnosis Algorithm Challenge • Large fault space  brute-force search is infeasible 1. Initialization: diagnosed fault set F = { } 2. while (diff(MeasuredPerf, SimulatedPerf(F)) > threshold) { Foreach f in F Adjust f’s magnitudes if necessary Delete f is its magnitude is too small Add a new candidate fault if necessary Simulate } 3. Report F

  12. Performance Evaluation Effectiveness of data cleaning • Detect >80% misbehaving nodes with <15% false positive Effectiveness of fault diagnosis Accuracy of detecting combinations of packet dropping, MAC misbehavior, and external noise in 25-node random topology

  13. Performance Evaluation Test-bed • Implemented the technique in a small multi-hop IEEE 802.11a mesh testbed • Detected network congestion and random packet dropping

  14. What-if Analysis

  15. Conclusion & Future Work Propose online trace-driven simulation • Diagnose faults • Test alternative network configurations • Our evaluation results show it is promising Future work • Validate it in a larger-scale testbed • Extend it to handle mobility • Apply it to handle other types of faults

  16. Thank you!

  17. Related Work Protocols for wireless network management • Ad Hoc Network Management Protocol (ANMP) • Guerrilla Management Architecture • Complementary to our work Fault management for wireless infrastructure networks • AirWave, AirDefense, UniCenter, WNMS, IBM WSA, Wibhu SpectraMon … • Different from multihop wireless networks Detect specific faults in multihop wireless networks • Routing misbehavior • Mac misbehavior, …

  18. Trace-driven Simulation CandidateFaults Fault Injection SimulatedPerformance RoutingUpdates RouteSimulation LinkLoads Traffic Simulation

More Related