dynamic rule based quality control framework for real time sensor data l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data PowerPoint Presentation
Download Presentation
Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data

Loading in 2 Seconds...

play fullscreen
1 / 19

Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data - PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on

Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data. Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia. Introduction. Quality Control of high volume, real-time data from automated sensors is an emerging challenge

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dynamic rule based quality control framework for real time sensor data

Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data

Wade Sheldon

Georgia Coastal Ecosystems LTER

University of Georgia

introduction
Introduction
  • Quality Control of high volume, real-time data from automated sensors is an emerging challenge
    • Traditional techniques (plotting, stats) often don’t scale well
    • Data validation and Q/C can be limiting factor in getting data “online”
    • Difficulties lead to release delays or posting provisional data
  • Software developed at Georgia Coastal Ecosystems LTER has proven useful for Q/C of real-time data
  • Designed to automate GCE data processing and metadata generation, but very generalized and supports any tabular data
  • Provides dynamic, rule-based Q/C framework for data processing, analysis and synthesis
framework components
Framework Components
  • Comprehensive data model
    • Implemented as hierarchical MATLAB ‘structure’ arrays
    • Package dataset & attribute metadata, data, Q/C rules, qualifier flags
  • Metadata-based MATLAB software (GCE Data Toolbox)
    • Automatic (rule-based) and manual assignment of Q/C qualifier flags
    • Transparent management of flags throughout all data manipulation
    • Q/C-aware data management and analysis tools
    • Q/C-aware data integration and synthesis tools
  • Modular implementation supports many scenarios
    • Interactive (command-line API and GUI forms)
    • Automated workflows (timed or triggered)
    • End-to-end (logger-to-scientist) or part of larger workflow
    • Runs natively on multiple platforms (PC, *nix, MacOS)
quality control rules
Quality Control Rules
  • Basic syntax: [logical expression]=’[flag code]’
  • Logical Expressions:
    • Any conditional statement or call to MATLAB function that returns logical array (0 = false, 1 = true)
    • Dataset columns referenced in statements as:
      • “x” – alias for current column (e.g. x<0)
      • “col_[name]” – any dataset column by name (e.g. “col_Depth<0”)
  • Flag Codes:
    • Alphanumeric character to assign when expression true (I, q, 9, *)
    • Codes defined in the dataset metadata (I = invalid value, …)
  • Unlimited rules per attribute, multiple flags per value
quality control rule examples
Quality Control Rule Examples
  • Numeric Comparisons:
    • Simple:
      • x<0=‘I’ (flags negative values)
      • x<0=‘I’;x>100=‘I’;x<20=‘Q’;x>80=‘Q’ (overlapping bounds checks)
quality control rule examples7
Quality Control Rule Examples
  • Numeric Comparisons:
    • Simple:
      • x<0=‘I’ (flags negative values)
      • x<0=‘I’;x>100=‘I’;x<20=‘Q’;x>80=‘Q’ (overlapping bounds checks)
    • Statistical:
      • x>(mean(x)+3*std(x))=‘Q’;x<(mean(x)-3*std(x))=‘Q’(flags values more than 3 standard deviations from column mean)
quality control rule examples8
Quality Control Rule Examples
  • Numeric Comparisons:
    • Simple:
      • x<0=‘I’ (flags negative values)
      • x<0=‘I’;x>100=‘I’;x<20=‘Q’;x>80=‘Q’ (overlapping bounds checks)
    • Statistical:
      • x>(mean(x)+3*std(x))=‘Q’;x<(mean(x)-3*std(x))=‘Q’ (flags values more than 3 standard deviations from column mean)
    • Multi-column:
      • col_DOC>col_TOC=‘I’ (in column DOC; flags DOC exceeding TOC)
      • col_Dry_Weight<(col_Wet_Weight-col_Ash_Weight)*0.90 =’I’ (flags dry weights below 90% wet weight – ash weight)
      • col_Depth<0=‘I’ (in column Salinity; flags Salinity when Depth < 0)
quality control rule examples9
Quality Control Rule Examples
  • Numeric Comparisons:
    • Simple:
      • x<0=‘I’ (flags negative values)
      • x<0=‘I’;x>100=‘I’;x<20=‘Q’;x>80=‘Q’ (overlapping bounds checks)
    • Statistical:
      • x>(mean(x)+3*std(x))=‘Q’;x<(mean(x)-3*std(x))=‘Q’ (flags values more than 3 standard deviations from column mean)
    • Multi-column:
      • col_DOC>col_TOC=‘I’ (in column DOC; flags DOC exceeding TOC)
      • col_Dry_Weight<(col_Wet_Weight-col_Ash_Weight)*0.90 =’I’ (flags dry weights below 90% wet weight – ash weight)
      • col_Depth<0=‘I’ (in column Salinity; flags Salinity when Depth < 0)
    • Compound (Boolean operators):
      • col_RH_Percent>100&col_Precip<=0.1=‘Q’ (flags humidity > 100% except during significant precipitation events)
quality control rule examples cont
Quality Control Rule Examples (cont.)
  • Text Comparisons:
    • “IS”, “NOT” for string literals, “IN”, “NOT IN” for lists
    • flag_notinlist(x,’Spartina,Juncus,Zizaniopsis’)=‘Q’
quality control rule examples cont11
Quality Control Rule Examples (cont.)
  • Text Comparisons:
    • “IS”, “NOT” for string literals, “IN”, “NOT IN” for lists
    • flag_notinlist(x,’Spartina,Juncus,Zizaniopsis’)=‘Q’
  • Algorithmic Criteria (custom functions):
    • fn(columns,parameters)=‘Q’
    • Various included Q/C functions
      • pattern checks, geographic checks, specialized algorithms (O2 saturation, etc)
    • User-defined functions:
      • Any MATLAB code or “wrapped” calls to FORTRAN, Java, Python, etc
      • Unlimited scope
quality control rule examples cont12
Quality Control Rule Examples (cont.)
  • Text Comparisons:
    • “IS”, “NOT” for strings, “IN”, “NOT IN” for lists
    • flag_notinlist(x,’Spartina,Juncus,Zizaniopsis’)=‘Q’
  • Algorithmic Criteria (custom functions):
    • fn(parameters)=‘Q’
    • Various included Q/C functions
      • pattern checks, geographic checks, specialized algorithms (O2 saturation, etc)
    • User-defined functions:
      • Any MATLAB code or “wrapped” calls to FORTRAN, Java, Python, etc
      • Unlimited scope
  • Full suite of MATLAB numeric analysis capabilities supported, and extensible to use other technology
q c rule management
Q/C Rule Management
  • Rule definitions can be defined in metadata “templates”, automatically applied to attributes when raw data imported
  • Rules can also be created, managed using a GUI form
q c flag assignment
Q/C Flag Assignment
  • Q/C criteria evaluated to assign/clear flags when:
    • Metadata template applied or Q/C criteria edited
    • New data records, columns added
    • Values edited (GUI) or columns updated (CLI)
    • Evaluation function (dataflag) invoked directly
  • Flags can also be assigned/cleared manually by:
    • Clicking/dragging on plots with the mouse
    • Using a spreadsheet-like grid
    • Importing from text attributes (e.g. 3rd party codes)
    • Propagating flags from source column(s) to dependent column(s)
  • Manual assignment locks flags by inserting “manual” token in criteria, removing “manual” restores automatic evaluation
q c aware data management analysis
Q/C-Aware Data Management & Analysis
  • Q/C flags can be visualized in data editor grid and plots
  • Flagged values can be selectively removed from data sets
  • Statistics can be generated with/without flagged values
  • Flags can be instantiated as coded text columns for export
  • Flagged, missing values can be summarized by parameter and date for metadata
q c aware data synthesis
Q/C-Aware Data Synthesis
  • Flagged, missing values summarized in re-sampled data (aggregated, binned, date-time resampled), with automatic Q/C rule creation
  • Flags automatically “locked” when merging multiple data sets (i.e. unions)
  • All Q/C operations logged to processing history, reported in metadata to document lineage
implementation scenarios
Implementation Scenarios
  • End-to-End (logger-to-scientist)
    • Acquire raw data from logger or file system (standard or custom import filters)
    • Assign metadata from template or using forms to validate and flag data
    • Review data and fine-tune flag assignments
    • Generate distribution files & plots, archive data, index for searching
    • Desktop data management solution
  • Data Pre-processing
    • Acquire, validate and flag raw data (on demand or timed/triggered)
    • Upload processed data files (e.g. csv) or value & flag arrays to RDBMS
  • Workflow Step
    • Call toolbox functions as part of another workflow process, custom program
    • Kepler MATLAB actor?
suitability for real time sensor data
Suitability for Real-Time Sensor Data
  • Good Scalability
    • Data volumes only limited by computer memory (tested >2 GB data sets)
    • Multiple instances can be run on high-end, 64bit, clustered workstations
    • Good flag evaluation performance in use, testing with diverse rule sets
  • Good scope for automation
    • Timed and triggered workflow implementations easy to deploy
  • Support for multiple I/O formats, transport protocols
    • Formats: ASCII, MATLAB, SQL, XML (partially implemented)
    • Transport: local file system, UNC paths, HTTP, FTP, SOAP
  • Already used for real-time GCE data, USGS data harvesting service (LTER HydroDB, CWT)
concluding remarks
Concluding Remarks
  • Benefits
    • Flexible, modular design
    • No qualifier vocabulary, semantics assumed – many purposes, standards
    • Many operations on flagged values – supports different strategies for archiving and distributing data at different processing levels
  • Limitations
    • Requires MATLAB
    • Rule syntax environment-specific – a more open standard would be ideal
    • Support for XML metadata immature (but more development planned)
  • More information and downloads at:http://gce-lter.marsci.uga.edu/public/im/tools/data_toolbox.htm

This work was supported by the National Science Foundation under grant numbers OCE-9982133 and OCE-0620959