1 / 41

Improving Data Quality in Wireless Sensing Systems

Improving Data Quality in Wireless Sensing Systems. Matthias Keller , Jan Beutel , Lothar Thiele. PermaSense Seminar, 10.08.2011. PermaSense Matterhorn Deployment. August 2008 – today Single base station Up to 24 sensor nodes TinyOS /Dozer [Burri2007] Constant rate

erol
Download Presentation

Improving Data Quality in Wireless Sensing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Data Quality in Wireless Sensing Systems Matthias Keller, Jan Beutel, Lothar Thiele PermaSense Seminar, 10.08.2011

  2. PermaSense Matterhorn Deployment • August 2008 – today • Single base station • Up to 24 sensor nodes • TinyOS/Dozer [Burri2007] • Constant rate • < 0.1 MByte/node/day

  3. Sensor Data Outlier Filtering • A. Hasler: Threshold-based removal of bogus data, down sampling from 2 to 10 minutes sampling interval • Tolle et al.: Temperature measurements, outlier rejection based on battery voltage level • E. Elnahrawyet al.: Bayesian approach for cleaning noisy sensor data • H. Jeunget al.: Data cleaning with model-based anomaly detector • Necessary step to mitigate artifacts of faulty sensors • Usually done by scientific data user/domain expert • Must assume a certain input data quality

  4. Currently Untouched Artifacts • We can observe • Packet duplicates • Node restarts • Order inconsistencies • Temporal vs. logical

  5. The Observed Phenomena … • Modify results derived from the data • Statistics, observed sequences of states, … • Are unacceptable when data quality is key • Scientific modeling, early warning, … • Difficult to avoid in real sensor networks • Resource-scarcity, dynamics, multi-hop routing … Data cleaning and system validation on a higher layer • Removal of artifacts threatening data utility • Guarantees on data quality and data ordering Problem Statement

  6. Goals of the Data Analysis • Validate packets based on a model of the real system • For valid packets • Add extra packet ordering information • Provide guarantees on time information • Mark other packets as non-conforming

  7. Related Work • Logical notion of time • Lamport’s clock, vector clocks • Network time synchronization • NTP, FTSP, gradient clock sync, … • Data-driven time synchronization • Using microseismics[Lukac2009] • Using sunlight measurements [Gupchup2009] • Offline time reconstruction with Phoenix [Gupchup2010] • Sensor nodes exchange clock information during runtime

  8. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  9. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  10. Model of Multi-hop Data Collection • Periodic sampling • Sampling period T • Sequencing • Increasing sequence number • Resets on arithmetic overflow } } } } T T T T • Elapsed time on arrival • Sensor nodes measure packet sojourn time • Base station annotates packets with UTC timestamps a 4 sec c 4 sec 1 sec b 7 sec 6 sec 2011/04/14 10:03:31 – 7 sec = 2011/04/14 10:03:24 2 sec

  11. Error Model • Clock drift • Affects measurement of • Sampling period T • Packet sojourn time ts • Indirectly leading to ordering inconsistencies • Temporal vs. logical • Node restarts • Cold restart: Power cycle • Soft restart: Watchdog reset } } T <T Shortens sampling period • Packet loss • Packet duplicates Lost 1-hop ACK 2 Node restart ✗ 1 ✗ ✗ Empty queue ✗ 3 Queue reset Retransmission

  12. Formal System Model (1/2) Considering a single sensor node with source address o: • Abstract sequence counter: i • – at last cold restart: • Packet sequence number: • Sampling period: T • Clock drift and resolution: • Packet generation time:

  13. Formal System Model (2/2) • Estimated sojourn time on node N: • Estimated total sojourn time: • Arrival time at base station: • Estimated generation time: • Maximum network diameter: • Error bounds on generation time calculation:

  14. Data Processing • Input format: • Origin o, Sequence number s, total sojourn time , payload p, arrival time tb • Output format: • Unique packet identifier id reflects temporal order of generation • Bounds on packet generation time

  15. Analysis Concepts • Remove uncertainty caused by sequence number • Assign packets to epochs • Determine unique packet id • Determine upper and lower bounds on generation time • Use forward and backward reasoning • Remove non-compliant packets • Duplicated packets • Incorrect time information • Behavior not covered by formal model problems:- arithmetic overflow- node restarts problems:- clock drift- node restarts

  16. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  17. Bounds on Packet Generation Time • Worst-case bounds for a single packet • Forward and backward reasoning is applied to tighten these bounds • Requirement: Exact ordering information

  18. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  19. Duplicate Filtering • We consider packets with • the same source address o • the same sequence number s • an equal payload p • We construct a graph G = (V, E) • Duplicate-free data set is achieved by only considering packets that are within the maximum independent set of G v w v

  20. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  21. Separate Data into Epochs • Observation: Sequence number s(i) resets to zero • Every smax packets due to arithmetical overflow • After a cold restart due to loss of state • After epoch assignment: • k “generated before” l id(k) < id(l) s(i) smax i e = 1 e = 2 e = 3 e = 4 e = 5

  22. Epoch Assignment (1/3) • For each packet, calculate a reference point • Ideal case: Perfect clocks, absence of node restarts • Packets belonging to the same epoch have an equal reference point TC • Real case: Imperfect clocks, node restarts • Epoch assignment based on bound

  23. Epoch Assignment (2/3) Theorem 1 All packets k, l that belong to the same epoch, i.e., e(k) = e(l), satisfy where where is an upper bound on the network sojourn time, i.e., and

  24. Epoch Assignment (3/3) Theorem 2 Suppose that the generation period T satisfies Then all packets k, l that belong to different epochs, e.g., e(k) < e(l), satisfy Where is defined in Theorem 1.

  25. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  26. Forward and Backward Reasoning • Initially set worst-case bounds are often too pessimistic • Given the correct order of packet generation, initially set bounds can be improved by using information from temporarily adjacent packets • Example: A packet cannot be generated earlier than its predecessor i i-1 t

  27. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  28. Matterhorn Deployment Data • Three phases of system operation • Initial difficulties with hardware and software • Non-conforming system operation • Sensor nodes subject to a high number of restarts • Daily shut down of base station due to insufficient energy

  29. Model Validation (1/2) I) II) Model-based approach Unfiltered Data Verified Data Duplicate filtering Epoch assignment Violating packets Model Model ?= ?= ?= ?= # of sequence violations # of sequence violations

  30. Model Validation (2/2) • Previously “dirty” data set has been restored for use • Appropriate method for continuous system validation

  31. Conclusions • Data integrity testing and order reconstruction based on a system model of a real system • Give guarantees on data quality • Duplicate-free data • Correct temporal order of generation • Correct logical ordering • Proposed intermediate packet filtering step facilitates the usage of wireless sensor networks for applications that require highest data quality Matthias Keller, LotharThiele, Jan Beutel: Reconstruction of the Correct Temporal Order of Sensor Network Data, IPSN 2011, April 2011, pp. 282-293

  32. Overview • System Model • Data Analysis • Bounds on packet generation time • Duplicate filtering • Epoch assignment • Forward and backward reasoning • Case Study with Model Validation • Recent Results & Usage Example 1 2 3 4

  33. Case Study Deployments

  34. Long-term Data Quality

  35. PermaDozer Performance Analysis • The received signal strength indicator (RSSI) is measured for every successful reception of a packet • Ratio between signal strength and noise floor • Higher ratio of duplicates at more challenging environments at Matterhorn and Jungfraujoch Matthias Keller, Matthias Woehrle, Roman Lim, Jan Beutel, Lothar Thiele: Comparative Performance Analysis of the PermaDozer Protocol in Diverse Deployments, SenseApp 2011, October 2011, accepted for publication

  36. Sequence Meta-Data Usage Example • Query for unfiltered data • Filtered, ordered data with timestamp guarantees SELECT d.GENERATION_TIME, d.TEMPERATURE FROM nodehealth AS d ORDER BY d.GENERATION_TIME ASC Timestamp uncertainty SELECT d.GENERATION_TIME, d.TEMPERATURE, (s.GENERATION_TIME_UPPER-s.GENERATION_TIME_LOWER) as Q FROM nodehealth AS d JOIN nodehealth_sequence AS s USING (PK)ORDER BY s.ID ASC Inner table join includes only valid data Discrete index

  37. Outlook • Data cleaning operations within GSN virtual sensors • Visualization with SensorViz plot application • Now live on http://data.permasense.ch Matthias Keller, Jan Beutel: Efficient Data Retrieval for Interactive Browsing of Large Sensor Network Data Sets (Demo), IPSN 2011, April 2011, pp. 139-140

  38. BACKUP

  39. Forward/Backward Reasoning Results • Intervals are tightened for 90% of the packets • Mean interval width is reduced by a factor of almost three

  40. Proof Idea for ΔTC • Calculating TC(i) assumes packet generation every T • In practice, the mean distance over smax packets is • < T in the presence of node restarts and a faster clock • > T in the absence of node restarts and a slower clock • We need to bound • the minimal inter-arrival time of a warm restart • the maximum sojourn time of a packet

  41. Validation Results • Only the model-based approach is able to clean data from the first phase A) of non-conforming system operation

More Related