1 / 48

Diagnosing and Debugging Wireless Sensor Networks

Diagnosing and Debugging Wireless Sensor Networks. Eric Osterweil Nithya Ramanathan. Contents. Introduction Network Management Parallel Processing Distributed Fault Tolerance WSNs Calibration / Model Based Conclusion. What do apples, oranges, peaches have in common?.

marin
Download Presentation

Diagnosing and Debugging Wireless Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diagnosing and Debugging Wireless Sensor Networks Eric Osterweil Nithya Ramanathan

  2. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  3. What do apples, oranges, peaches have in common? Well, they are all fruits, they all grow in groves of trees, etc. However, grapes are also fruits, but they grown on vines! ;)

  4. Defining the Problem • Debugging – an iterative process of detecting and discovering the root-cause of faults • Distinct debugging phases • Pre-deployment • During deployment • Post-deployment • Ongoing Maintenance / Performance Analysis – How different from debugging?

  5. Characteristic Failures1,2 • Pre-Deployment • Bugs characteristic of wireless, embedded, and distributed platforms • During Deployment • Not receiving data at the sink • Neighbor density (or lack thereof) • badly placed nodes • Flaky/variable link connectivity 1 R. Szewczyk, J. Polastre, A. Mainwaring, D. Culler “Lessons from a Sensor Network Expedition”. In EWSN, 2004 2 A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler “Wireless Sensor Networks for Habitat Monitoring”. In ACM International Workshop on Wireless Sensor Networks and Applications.

  6. Characteristic Failures (continued) • Post-Deployment • Failed/rebooted nodes • “Funny” nodes/sensors • batteries with low-voltage levels • Un-calibrated sensors • Ongoing Maintenance / Performance • Low bandwidth / dropped data from certain regions • High power consumption • Poor load-balancing, or high re-transmission rate

  7. Scenarios • You have just deployed a sensor network in the forest, and are not getting data from any node – what do you do? • You are getting wildly fluctuating averages from a region – is this caused by • Actual environmental fluctuations • Bad sensors • Data randomly dropped • Calculation / algorithmic errors • Tampered nodes

  8. Challenges • Existing tools fall-short for sensor networks • Limited visibility • Resource constrained nodes (Can’t run “gdb”) • Bugs characteristic of embedded, distributed, and wireless platforms • Can’t always use existing Internet fault-tolerance techniques (i.e. rebooting) • Extracting Debugging Information • With minimal disturbance to the network • Identifying information used to infer internal state • Minimizing central processing • Minimizing resource consumption

  9. Challenges (continued) • Applications behave differently in the field • Testing configuration changes • Can’t easily log on to nodes • Identifying performance-blocking bugs • Can’t continually manually monitor the network (often physically impossible depending on deployment environment)

  10. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  11. What is Network Management? I don’t have to know anything about my neighbor to count on them…

  12. Network Management • Observing and tracking nodes • Routers • Switches • Hosts • Ensuring that nodes are providing connectivity • i.e. doing their jobs

  13. Problem • Connectivity failures versus device failures • Correlating outages with their cause(s)

  14. Hosts Switches Core Switches Routers Outage Example

  15. Approach • Polling • ICMP • SNMP • “Downstream event suppression” • If routing has failed, ignore events about downstream nodes • Modeling

  16. Outage Example (2)

  17. How does this area differ from WSNs?

  18. Applied to WSNs • Similarities • Similar topologies • Intersecting operations • Network forwarding, routing, etc. • Connectivity vs. device failures • Differences • Network links • Topology dynamism

  19. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  20. What is Parallel Processing? If one car is fast, are 1,000 cars 1,000 times faster?

  21. Parallel Processing • Coordinating large sets of nodes • Cluster sizes can range to the order of 104 nodes • Knowing nodes’ states • Efficient resource allocation • Low communication overhead

  22. Problem • Detecting faults • Recovery of faults • Reducing communication overhead • Maintenance • Software distributions, upgrades, etc.

  23. Approach • Low-overhead state checks • ICMP • UDP-based protocols and topology sensitivity • Ganglia • Process recovery • Process checkpoints • Condor

  24. How does this area differ from WSNs?

  25. Applied to WSNs • Similarities • Potentially large sets of nodes • Relatively difficult to track state (due to resources) • Tracking state is difficult • Communication overheads are limiting

  26. Applied to WSNs (continued) • Differences • Topology is more dynamic in WSNs • Communications are more constrained • Deployment is not structured around computation • Energy is limiting rather than computation overhead • WSNs are much less latency sensitive

  27. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  28. What is Distributed Fault Tolerance? Put me in coach… PUT ME IN!

  29. Distributed Fault Tolerance • High Availability is a broad category • Hot backups (failover) • Load balancing • etc.

  30. Problem(s) • HA • Track status of nodes • Keeping access to critical resources available as much as possible • Sacrifice hardware for low-latency • Load balancing • Track status of nodes • Keeping load even

  31. Approach • HA • High frequency/low latency heartbeats • Failover techniques • Virtual interfaces • Shared volume mounting • Load balancing • Metric (Round robin, least connections, etc.)

  32. How does this area differ from WSNs?

  33. Applied to WSNs • HA / Load balancing • Similarities • Redundant resources • Differences • Where to begin…MANY

  34. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  35. What are WSNs? Warning, any semblance of an orderly system is purely coincidental…

  36. BluSH1 • Shell interface for Intel’s IMotes • Enables interactive debugging – can walk up to a mote and access internal state 1 Tom Schoellhammer

  37. Sympathy1,2 • Aids in debugging • pre, during, and post-deployment • Nodes collect metrics & periodically broadcast to the sink • Sink ensures “good qualities” specified by programmer • based on metrics and other gathered information • Faults are identified and categorized by metrics and tests • Spatial-temporal correlation of distributed events to root-cause failures • Test Injection • Proactively injects network probes to validate a fault hypothesis • Triggers self-tests (internal actuation) 1 N. Ramanathan, E. Kohler, D. Estrin, "Towards a Debugging System for Sensor Networks", International Journal for Network Management, 2005. 2 N. Ramanathan, E. Kohler, L. Girod, D. Estrin. "Sympathy: A Debugging System for Sensor Networks". in Proceedings of The First IEEE Workshop on Embedded Networked Sensors, Tampa, Florida, USA, November 16, 2004

  38. SNMS1 • Enables interactive health monitoring of WSN in the field • 3 Pieces • Parallel dissemination and collection • Query system for exported attributes • Logging system for asynchronous events • Small footprint / low overhead • Introduces overhead only with human querying 1Gilman Tolle, David Culler, “Design of an Application-Cooperative Management System for WSN” Second EWSN, Istanbul, Turkey, January 31 - February 2, 2005

  39. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  40. What is Calibration and Modeling? Hey, if you and I both think the answer is true, then whose to say we’re wrong? ;)

  41. Modeling1,2,3 • “Root-cause Localization” in large scale systems • Process of “identifying the source of problems in a system using purely external observations” • Identify “anomalous” behavior based on externally observed metrics • Statistical analysis and Bayesian networks used to identify faults 1E. Kiciman, A. Fox “Detecting application-level failures in component-based internet services”. In IEEE Transactions on Neural Networks, Spring 2004 2 A. Fox, E. Kiciman, D. Patterson, M. Jordan, R. Katz. “Combining statistical monitoring and predictable recovery for self-management”. In Procs. Of Workshop on Self-Managed Systems, Oct 2004 3 E. Kiciman, L Subramanian. “Root cause localization in large scale systems”

  42. Calibration1,2 • Model physical phenomena in order to predict which sensors are faulty • Model can be based on: • Environment that is monitored – e.g. assume that the majority of sensors are providing correct data and then identify sensors that make this model inconsistent1 • Assumptions about the environment – e.g. in a densely sampled area, values of neighboring sensors should be “similar”2 • Debugging can be viewed as sensor network system calibration • Use system metrics instead of sensor data • Based on a model of what metrics in a properly behaving system should look like, can identify faulty behavior based on inconsistent metrics. • Locating and using ground truth • In situ deployments • Low communication/energy budgets • Bias • Noise 1Jessica Feng, S. Megerian, M. Potkonjak “Model-based calibration for Sensor Networks”. IEEE International Conference on Sensors, Oct 20032 A Collaborative Approach to In-Place Sensor Calibration – Vladimir Bychovskiy Seapahn Megerian et al

  43. Contents • Introduction • Network Management • Parallel Processing • Distributed Fault Tolerance • WSNs • Calibration / Model Based • Conclusion

  44. Promising Ideas • Management by Delegation • Naturally supports heterogeneous architectures by distributing control over network • Dynamically tasks/empowers lower-capable nodes using mobile code • AINs • Node can monitor its own behavior, detect, diagnose, and repair issues • Model-based fault detection • Models of physical environment • Bayesian inference engines

  45. Comparison • Network Management • Close, but includes some inflexible assumptions • Parallel Processing • Many similar, but divergent constraints • Distributed Fault Tolerance • Almost totally different • WSNs • New techniques emerging • Calibration • WSN related work becoming available 1 F. Gump et al

  46. Conclusion • Distributed debugging is as distributed debugging does1 • WSNs are a particular class of distributed system • There are numerous techniques for distributed debugging • Different conditions warrant different approaches • OR different spins to existing techniques 1 F. Gump et al

  47. References • Todd Tannenbaum, Derek Wright, Karen Miller, and Miron Livny, "Condor - A Distributed Job Scheduler", in Thomas Sterling, editor, Beowulf Cluster Computing with Linux, The MIT Press, 2002. ISBN: 0-262-69274-0 • http://www.open.com/pdfs/alarmsuppression.pdf • http://www.top500.org/ • .E. Culler and J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1999, ISBN 1-55860-343-3. • The Ganglia Distributed Monitoring System: Design, Implementation, and Experience.Matthew L. Massie, Brent N. Chun, and David E. Culler. Parallel Computing, Vol. 30, Issue 7, July 2004. • HA-OSCAR Release 1.0 Beta: Unleashing HA-Beowulf, 2nd Annual OSCAR symposium, Winnipeg, Manitoba Canada, May 2004 .

  48. Questions? No? Great! ;)

More Related