gce data toolbox metadata based tools for automated data processing and analysis l.
Skip this Video
Download Presentation
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis

Loading in 2 Seconds...

play fullscreen
1 / 16

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis - PowerPoint PPT Presentation

  • Uploaded on

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis. Wade Sheldon University of Georgia GCE-LTER. Rationale.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'GCE Data Toolbox -- metadata-based tools for automated data processing and analysis' - kaycee

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
gce data toolbox metadata based tools for automated data processing and analysis

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis

Wade Sheldon

University of Georgia


  • Data processing, quality control, data analysis and metadata generation traditionally carried out as separate activities, often in different time frames using different technologies
  • Problems:
    • Metadata may not reflect all processing steps
    • Much routine data analysis done w/o Q/C, metadata
    • No economy of scale – leads to “one-off” solutions
  • Metadata generation should ideally occur throughout the data cycle and “inform” data analysis
design goals
Design Goals
  • Develop Integrated Storage Standard
    • Tabular Data
    • QA/QC Information
    • Metadata (overall data set & columns/attributes)
  • Develop Software to Support Standard
    • Code Library/API
    • User Interfaces
  • Apply Technology to Acquire, Manage, Distribute GCE-LTER Data
  • Explore Use as Prototype Technology for Metadata-based Data Processing, Synthesis
storage standard
Storage Standard
  • Developed Using MATLAB®
    • Local expertise, large scientific user base
    • Cross-platform (Win32, Solaris, *nix, Mac OS/x)
    • Rapid development environment
    • Supports multiple interfaces (interactive command line, batch-mode scripts, GUI, WWW)
    • Good interoperability with other technologies (Java, PERL, SQL)
  • Defined “GCE Data Structure” Spec. (based on MATLAB/C structures)
    • Structure with 17 named fields
    • Specific content rules for each field (software validation)
    • Combines data, metadata, QA/QC, processing history
storage standard5
Storage Standard

GCE Data Structure Specification (v1.1)

software gce data toolbox
Software – GCE Data Toolbox
  • Core Function Library
    • Create, Validate Structures
    • Import Data, Metadata (ASCII, MATLAB, SQL)
    • Manipulate Data, Metadata (unit conversions, add/delete/update)
    • Export Data, Metadata (various formats)
    • Dynamic, Rule-base QA/QC Flagging
  • Self-documenting Processing
    • Operation Logging (Processing History)
    • Transparent Metadata Creation/Updating
    • Dynamic (JIT) Metadata Generation for Columns
  • Support for Metadata “Templating”
    • Application of Boilerplate Metadata based on Parameter Matching
    • Supports Rapid Documentation of Routine Data Sources
software gce data toolbox7
Software – GCE Data Toolbox
  • Support for Analysis
    • Descriptive Statistics, Reports
    • Visualization, Mapping
  • Support for Synthesis
    • Composite Data Set Creation
      • Multiple Data Set Merge/Concatenation
      • Relational Join
      • Metadata Content Meshing
    • Data Set Summarization
      • Statistical Data Reduction/Re-sampling
    • Data Set Standardization
      • Unit Conversions (automatic, interactive)
      • Template-based Semantic Mapping
      • Automatic Semantic Mediation (prototype stage)
software user interfaces
Software – User Interfaces
  • Unattended Batch Mode Processing
  • Interactive Command Line Processing (conventional MATLAB UI)
    • Full help text for each function
    • Well-defined input/output arguments
  • GUI Applications
    • Standard Forms, Dialogs, Controls
    • No MATLAB Experience Required
  • WWW – MATLAB Web Server
    • HTML Forms, Querystring Input
    • HTML Pages and/or Static File Output
current applications
Current Applications
  • Automated Data Processing
    • Direct data import from data logger files, WWW data sources (USGS), SQL queries
    • Automatic metadata creation (templates, data mining)
    • Rule-based QA/QC flagging
  • Data Set Packaging
    • Batch processing to create/update data, metadata products
    • On-demand generation of data, metadata, stat reports in custom formats (end-user scripts, GUI applications, WWW forms)
current applications13
Current Applications
  • Data Exploration/Analysis by PIs
    • Descriptive Statistics based on attribute metadata
    • Visualization with Interactive Filtering (Frequency Histograms, 2D Plots, Map Plots)
  • Data Reduction/Re-sampling to Provide Customized Data at Various “Scales”
    • Aggregated Statistics
    • Binned Statistics
    • Query/Filtering (sub-selection)
current applications14
Current Applications
  • Data Harvesting (GCE)
    • USGS Data (WWW real-time, daily, finalized data)
    • Campbell Scientific Data Arrays (post-processing triggered after LoggerNet Retrieval)
    • Sea-Bird Hydrographic Data
  • USGS Data Harvesting Service for HydroDB
    • Weekly harvest for 31 stations/7 LTER Sites
    • Automatic Resampling, Unit Conversions, Q/C
  • Description, Screen-shots, Fully-functional Toolbox Available on WWW:


  • Requires MATLAB 5.3, 6.0, 6.5 (any platform)
  • “Public” Version Compiled
  • Source Code Requests Considered on Case-by-Case Basis
future development plans
Future Development Plans
  • EML 2.0 Support
  • Metadata-mediated Data Set Integration
    • Unit conversions
    • Re-sampling
  • More WWW Interface Development