1 / 42

TACO: T unable A pproximate C omputation of O utliers in Wireless Sensor Networks

TACO: T unable A pproximate C omputation of O utliers in Wireless Sensor Networks. ΠΑΟ : Π ροσεγγιστικός υπολογισμός Α κραίων τιμών σε περιβάλλ Ο ντα ασυρμάτων δικτύων αισθητήρων. Outline. Introduction Why outlier detection is important Definition of outlier The TACO Framework

marlon
Download Presentation

TACO: T unable A pproximate C omputation of O utliers in Wireless Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TACO: Tunable Approximate Computation of Outliers in Wireless Sensor Networks HDMS 2010, Ayia Napa, Cyprus

  2. ΠΑΟ:Προσεγγιστικός υπολογισμός Ακραίων τιμών σε περιβάλλΟντα ασυρμάτων δικτύων αισθητήρων HDMS 2010, Ayia Napa, Cyprus

  3. Outline • Introduction • Why outlier detection is important • Definition of outlier • The TACO Framework • Compression of measurements at the sensor level (LSH) • Outlier detection within and amongst clusters • Optimizations: Boosting Accuracy & Load Balancing • Experimental Evaluation • Related Work • Conclusions

  4. Introduction • Wireless Sensor Networks utility • Place inexpensive, tiny motes in areas of interest • Perform continuous querying operations • Periodically obtain reports of quantities under study • Support sampling procedures, monitoring/ surveillance applications etc • Constraints • Limited Power Supply • Low Processing Capabilities • Constraint Memory Capacity • Remark - Data communication is the main factor of energy drain

  5. Why Outlier Detection is Useful • Outliers may denote malfunctioning sensors • sensor measurements are often unreliable • dirty readings affect computations/decisions [Deligiannakis ICDE’09] • Outliers may also represent interesting events detected by few sensors • fire detected by a sensor • Take into consideration • the recent history of samples acquired by single motes • correlations with measurements of other motes! 16 19 24 30 32 40 39

  6. Outlier Definition • Let ui denote the latest W measurements obtained by mote Si • Given a similarity metric sim: RW→[0,1] and a similarity threshold Φ,sensors Si, Sjare considered similar if: sim(ui, uj) > Φ • Minimum Support Requirement • a mote is classified as outlier if its latest W measurements are not found to be similar with at least minSupother motes

  7. TACO Framework – General Idea 8.2 Network organization into clusters [(Youniset al, INFOCOM ’04),(Qin et al, J. UCS ‘07)] • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size 0 4.3 1 d W 5.1 0 … … Clusterhead Regular Sensor

  8. TACO Framework – General Idea If Sim(ui,uj)>Φ { supportSi++; supportSj++;} • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size • Step 2: Intra-cluster Processing • Encodings are transmitted to clusterheads • Clusterheads perform similarity tests based on a given similarity measure and a similarity threshold Φ • … and calculate support values Clusterhead Regular Sensor

  9. TACO Framework – General Idea • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size • Step 2: Intra-cluster Processing • Encodings are transmitted to clusterheads • Clusterheads perform similarity tests based on a given similarity measure and a similarity threshold Φ • … and calculate support values • Step 3: Inter-cluster Processing • An approximate TSP problem is solved. Lists of potential outliers are exchanged. Clusterhead Regular Sensor Additional load-balancing mechanisms and improvements in accuracy devised

  10. TACO Framework 8.2 • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size 0 4.3 1 d W 5.1 0 … … Clusterhead Regular Sensor

  11. Data Encoding and Reduction • Desired Properties • Dimensionality Reduction Reduced bandwidth consumption • Similarity Preservation Allows us to later derive initial sim(ui, uj) during vector comparisons • Locality Sensitive Hashing (LSH) • Ph є F[h(ui)=h(uj)]= sim(ui , uj ) • Practically, any similarity measure satisfying a set of criteria [Charikar, STOC ‘02] may be incorporated in TACO’s framework

  12. LSH Example: Random Hyperplane Projection [(Goemans & Wiliamson, J.ACM ’95),(Charikar, STOC ‘02) ] • Family of nd-dimensional randomvectors (rvi) • Generatesforeachdata vectora bitmapofsize n asfollows: • Sets biti=1 ifdotproductofdata vectorwithi-th randomvectorispositive • Sets biti=0 otherwise rv1 Sensor data (2-dimensional) rv2 rv3 rv4 TACO encoding: 1 0 0 1 16

  13. Computing Similarity • Cosine Similarity: cos(θ(ui,uj)) ui RHP(ui) n bits θ(ui, ui) ui RHP(uj) θ(RHP(ui),RHP(uj))=2/6*π=π/3 Angle Similarity Hamming Distance 17

  14. Supported Similarity Measures

  15. TACO Framework If Sim(ui,uj)>Φ { supportSi++; supportSj++;} • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size • Step 2: Intra-cluster Processing • Encodings are transmitted to clusterheads • Clusterheads perform similarity tests based on a given similarity measure and a similarity threshold Φ • … and calculate support values Clusterhead Regular Sensor

  16. Intra-cluster Processing • Goal: Find potential outliers within the clusters realm • Back to our running example, sensor vectors are considered similar when θ(ui , uj) < Φθ • Translate user-defined similarity threshold Φθ Φh = Φθ * d/π • For any received pair of bitmaps Xi, Xj, clusterheads can obtain an estimation of the initial similarity based on their hamming distance Dh(Xi,Xj)using: Dh(Xi,Xj) < Φh • At the end of the process <Si, Xi, support> lists are extracted for motes that do not satisfy the minSup parameter

  17. Intra-cluster Processing Probability of correctly classifying similar motes as such (W=16, θ=5, Φθ=10):

  18. TACO Framework • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size • Step 2: Intra-cluster Processing • Encodings are transmitted to clusterheads • Clusterheads perform similarity tests based on a given similarity measure and a similarity threshold Φ • … and calculate support values • Step 3: Inter-cluster Processing • An approximate TSP problem is solved. Lists of potential outliers are exchanged. Clusterhead Regular Sensor

  19. Boosting TACO Encodings d=n·μ Xi: Obtain the answer provided by the majority of the μ tests 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0 Xj: 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 SimBoosting(Xi,Xj)=1 • Check the quality of the boosting estimation(θ(ui,uj)≤ Φθ): • Unpartitioned bitmaps: Pwrong(d)=1-Psimilar(d) • Boosting:, Pwrong(d,μ) ≤ • Decide an appropriate μ: • Restriction on μ :Psimilar(d/μ)>0.5 • Comparison of (Pwrong(d,μ) ,Pwrong(d))

  20. Comparison Pruning d Modified cluster election process, returns Bbucket nodes Introducing a 2nd level of hashing based on the hamming weight of the bitmaps Comparison pruning is achieved by hashing highly dissimilar bitmaps to different buckets 0 d/4 d/4 d/2 d/2 3d/4 3d/4 d Clusterhead – Bucket Node Regular Sensor

  21. Load Balancing Among Buckets c3=d/16 c2=d/16 c1=d/12 c4=d/12 6 4 SB1 [0-3d/8] 3 3 3 3 SB2 (3d/8-9d/16] SB3 (9d/16-11d/16] SB4 (11d/16-d] 2 2 0 0 0 0 • Histogram Calculation Phase: • Buckets construct equi-width histogram based on the received Xi s hamming weight frequency 1 1 [f=(1,0,0), c4=d/12] [f=(3,3,4,6), c3=d/16] [f=(0,0,1), c1=d/12] • Histogram Communication Phase: • Each bucket communicates to the clusterhead • Estimated frequency counts • Width parameter ci 0 d/4 d/4 d/2 d/2 3d/4 3d/4 d • Hash Key Space Reassignment: • Clusterhead determines a new space partitioning and broadcasts the corresponding information SB3 SB4 SB1 SB2=SC

  22. Outline • Introduction • Why is Outlier Detection Important and Difficult • Our Contributions • Outlier detection with limited bandwidth • Compute measurement similarity over compressed representations of measurements (LSH) • The TACO Framework • Compression of measurements at the sensor level • Outlier detection within and amongst clusters • Optimizations: Load Balancing & Comparison Pruning • Experimental Evaluation • Related Work • Conclusions

  23. Sensitivity Analysis • Intel Lab Data - Temperature Avg. Precision Avg. Recall

  24. Sensitivity Analysis • Boosting • Intel Lab Data - • Humidity Avg. Precision Avg. Recall

  25. Performance Evaluation in TOSSIM • For 1/8reduction TACO provides on average 1/12 less bandwidth consumption, which reaches a maximum value of 1/15

  26. Performance Evaluation in TOSSIM • Network Lifetime: the epoch at which the first mote in the network dies. • Average lifetime for motes initialized with 5000 mJ residual energy • Reduction in power consumption reaches a ratio of 1/2.7

  27. TACO vs Hierarchical Outlier Detection Techniques • Robust [Deligiannakis et al, ICDE ‘09] falls short up to 10% in terms of the F-Measure metric • TACO ensures less bandwidth consumption with a ratio varying from 1/2.6 to 1/7.8

  28. Outline • Introduction • Why is Outlier Detection Important and Difficult • Our Contributions • Outlier detection with limited bandwidth • Compute measurement similarity over compressed representations of measurements (LSH) • The TACO Framework • Compression of measurements at the sensor level • Outlier detection within and amongst clusters • Optimizations: Load Balancing & Comparison Pruning • Experimental Evaluation • Related Work • Conclusions

  29. Related Work - Ours • Outlier reports on par with aggregate query answer [Kotidis et al, MobiDE’07] • hierarchical organization of motes • takes into account temporal & spatial correlations as well • reports aggregate, witnesses & outliers • Outlier-aware routing [Deligiannakis et al, ICDE ‘09] • route outliers towards motes that can potentially witness them • validate detection scheme for different similarity metrics (correlation coefficient, Jaccard index also supported in TACO) • Snapshot Queries [Kotidis, ICDE ’05] • motes maintain local regression models for their neighbors • models can be used for outlier detection • Random Hyperplane Projection using Derived Dimensions [Georgoulaset al MobiDE’10] • extends LSH scheme for skewed datasets • up to 70% improvements in accuracy

  30. Related Work • Kernel based approach [Subramaniam et al, VLDB ‘06] • Centralized Approaches [Jeffrey et al, Pervasive ‘06] • Localized Voting Protocols [(Chen et al, DIWANS ’06),(Xiao et al, MobiDE ‘07) ] • Report of top-K values with the highest deviation [Branch et al, ICDCS ‘06] • Weighted Moving Average techniques [Zhuang et al, ICDCS ’07]

  31. Συμπεράσματα • Our Contributions • outlier detection with limited bandwidth • The TACO/ΠΑΟ Framework • LSH compression of measurements at the sensor level • outlier detection within and amongst clusters • optimizations: Boosting Accuracy & Load Balancing • Experimental Evaluation • accuracy exceeding 80% in most of the experiments • reduced bandwidth consumption up to a factor of 1/12 for 1/8 reduced bitmaps • prolonged network lifetime up to a factor of 3 for 1/4 reduction ratio

  32. TACO: Tunable Approximate Computation of Outliers in Wireless Sensor Networks Thank you!

  33. Backup Slides

  34. TACO Framework 8.2 0 • Step 1: Data Encoding and Reduction • Motes obtain samples and keep the latest W measurements in a tumble • Encode W in a bitmap of d<<W size • Step 2: Intra-cluster Processing • Encodings are transmitted to clusterheads • Clusterheads perform similarity tests based on a given similarity measure and a similarity threshold Φ • … and calculate support values • Step 3: Inter-cluster Processing • An approximate TSP problem is solved. Lists of potential outliers are exchanged. 4.3 W d 1 … … Clusterhead Regular Sensor If Sim(ui,uj)>Φ { supportSi++; supportSj++;}

  35. Leveraging Additional Motes for Outlier Detection d • Introducing a 2nd level of hashing: • Besides cluster election, process continuous in each cluster so as to select Bbucket nodes with • For , 0≤ Wh(Xi)≤ d equally distribute the hash key space amongst them • Hash each bitmap to the • bucket • For bitmaps with Wh(Xi) at the edge of a bucket, transmit Xito the range: • which is guaranteed to contain at most 2 buckets since 0 d/4 d/4 d/2 d/2 3d/4 3d/4 d • Comparison Pruning is ensured by the fact that highly dissimilar bitmaps are hashed to different buckets, thus never being tested for similarity Clusterhead – Bucket Node Regular Sensor

  36. Leveraging Additional Motes for Outlier Detection d • Intra-cluster Processing: • Buckets perform bitmap comparisons as in common Intra-cluster processing • Constraints: • -If , similarity test is performed only in that bucket • - For encodings that were hashed to the same 2 buckets, similarity is tested only in the bucket with the lowest SBi • PotOut formation: • -SiPotOut if it is not reported by all buckets it was hashed to • -Received support values are added and SiєPotOutiffSupportSi < minSup 0 d/4 d/4 d/2 d/2 3d/4 3d/4 d Clusterhead – Bucket Node Regular Sensor

  37. Experimental Setup • Datasets: • Intel Lab Data : • Temperature and Humidity measurements • Network consisting of 48 motes organized into 4 clusters • Measurements for a period of 633 and 487 epochs respectively • minSup=4 • Weather Dataset : • Temperature, Humidity and Solar Iradiance measurements • Network consisting of 100 motes organized into 10 clusters • Measurements for a period of 2000 epochs • minSup=6

  38. Experimental Setup • Outlier Injection • Intel Lab Data & Weather Temperature, Humidity data : • 0.4% probability that a mote obtains a spurious measurement at some epoch • 6% probability that a mote fails dirty at some epoch • Every mote that fails dirty increases its measurements by 1 degree until it reaches a MAX_VAL parameter, imposing a 15% noise at the values • Intel Lab Data MAX_VAL=100 • Weather Data MAX_VAL=200 • Weather Solar Irradiance data : • Random injection of values obtained at various time periods to the sequence of epoch readings • Simulators • TOSSIM network simulator • Custom, lightweight Java simulator

  39. Sensitivity Analysis • Intel Lab Data - Humidity Avg. Precision Avg. Recall • Weather Data - • Humidity Avg. Precision Avg. Recall

  40. Sensitivity Analysis • Weather Data - Solar Irradiance Avg. Precision Avg. Recall • Boosting • Intel Lab Data - • Humidity Avg. Precision Avg. Recall

  41. Performance Evaluation in TOSSIM • Transmitted bits categorization per approach

  42. Bucket Node Introduction

More Related