How Dirty is your Data : The Duality between detecting Events and Faults

How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins University

Outline • Background • Problem Statement • Experiments • Results • Discussion

Application • Monitoring nesting conditions of the Maryland Box turtles • Science Questions: Do nesting conditions determine sex ? • Important to correlate observations with environmental events (rain, snow etc)

Duality of Faults & Events • Data gathered from Sensor Networks contain faults • Delivering faulty data consumes resources and pollutes statistics • Need for fault detection techniques • Fault Detection methods detect readings that deviate from “normal” or “expected” values • Environmental Events : • Scientifically interesting • Deviate from the norm

Research Question(s) • Are “Events” misclassified as “Faults” ? • What metrics could be used to quantify the misclassification ? • How does the misclassification vary with: • Type of Fault • Type of Fault Detection method • Type of modality (Moisture, Temperature) • Is it possible to design a fault detection mechanism that minimizes the misclassification ?

Know Thy Faults • Short Faults • Sudden Change in measurement • Noise Faults • Large variations in amplitude than expected • Little or no variation in amplitude (unresponsive)

Fault Detection Methods • SHORT Rule • If Xi – X(i-1) > δSHORT mark current measurement as fault (point method) δSHORTis established from domain knowledge • NOISE Rule • Take W successive samples • IF (σW ≤ σtrain-σallow) OR (σW ≥ σtrain+σallow), mark all W readings as faulty (block method) • σtrainand σallow are established from training data • Linear Least-square Estimation (LLSE) • Estimate expected value of a sensor’s value using other sensors using LLSE • If Xmodel – Xactual > δLLSE for k of the node’s neighbors, mark the reading as faulty (point method) A. Sharma, L. Golubchik, and R. Govindan, “On the prevalence of sensor faults in real world deployments”, IEEE conference on Sensor, Mesh and Ad Hoc Communications and networks (SECON), 2007

Event Period (Ei) Event Period (Ei) Misclassification Di Di time time Evaluation Metrics • Misclassification error (μ) for Point faults: • μ = event readings tagged as faults / total • event measurements • Misclassification error (μ) for Block Faults: Misclassification Total Misclassification (μ )= ∑i Di / ∑i Ei • Fault detection evaluation metric : False negative ratio = fraction of faults failed to be detected

Jug bay Deployment Map 6 5 2 Weather Station Turtle Nests Courtesy: Google maps 38.784607, -76.700460

Dataset Sensor Data: • Box temperature and soil moisture • 3 motes from Jug Bay (previous slide) • 5 months of data (sampled every 10 min.) • Train Data Set (1 month), Test Data Set (4 months) Event Ground Truth (Weather Data): • Precipitation data collected from a weather station ~ 700 m away (sampled every 15 min.) • 21 major events (i.e. rainfall) occurred • Total rainfall hours : 158 hours

Inject Faults to Establish ground Truth Faults Ground Truth Start with a clean data set

Methodology For Each Fault Detection Method & Each modality • Use 1st month’s data to Train • Obtain Model Parameters • Evaluate Method on Fault-Injected Test Data

Soil Moisture ‘SHORT RULE’ Reducing the number of misclassification errors increases false negatives

Misclassification LLSE method Higher misclassification can occur due to : Spatial & Temporal Heterogeneity of the soil

Lessons Learned • There exists a tension between detecting Events and Faults • Fault Detection Algorithms need to take this into consideration • Events can be misclassified as faults • Need for novel Fault Detection methods that are robust in the presence of Events

Need for Pattern Recognition techniques

Acknowledgements • Abhishek Sharma, Dept. of Computer Science, University of Southern California • Chris Swarth, Jug Bay Wetlands Sanctuary • Life Under Your Feet team • Marcus Chang, University of Copenhagen (Courtesy : Andreas Terzis)

Questions !!!!

How Dirty is your Data : The Duality between detecting Events and Faults

How Dirty is your Data : The Duality between detecting Events and Faults

Presentation Transcript

Detecting Outliers

Dirty Bombs

Quadratic Programming and Duality

Kobe, Japan

ExtraVirt: Detecting and recovering from transient processor faults

Monitoring Faults

Detecting non-stationary in the unit hydrograph

On the Duality of Operating System Structures

HIT ISG Fault Diagnostics in Smart Grid

Finding Faults in California

Earthquakes

Duality

Condition monitoring artefacts for detecting winding faults in wind turbine DFIGs

Damn ! Data !

Duality and Arrangements

Real-World Data Is Dirty

What do you do when you find a dirty car?

Kentucky Continuous Monitoring Process

Detecting abnormal events

Wash

The Dirty 30s