1 / 16

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis. Wade Sheldon University of Georgia GCE-LTER. Rationale.

kaycee
Download Presentation

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER

  2. Rationale • Data processing, quality control, data analysis and metadata generation traditionally carried out as separate activities, often in different time frames using different technologies • Problems: • Metadata may not reflect all processing steps • Much routine data analysis done w/o Q/C, metadata • No economy of scale – leads to “one-off” solutions • Metadata generation should ideally occur throughout the data cycle and “inform” data analysis

  3. Design Goals • Develop Integrated Storage Standard • Tabular Data • QA/QC Information • Metadata (overall data set & columns/attributes) • Develop Software to Support Standard • Code Library/API • User Interfaces • Apply Technology to Acquire, Manage, Distribute GCE-LTER Data • Explore Use as Prototype Technology for Metadata-based Data Processing, Synthesis

  4. Storage Standard • Developed Using MATLAB® • Local expertise, large scientific user base • Cross-platform (Win32, Solaris, *nix, Mac OS/x) • Rapid development environment • Supports multiple interfaces (interactive command line, batch-mode scripts, GUI, WWW) • Good interoperability with other technologies (Java, PERL, SQL) • Defined “GCE Data Structure” Spec. (based on MATLAB/C structures) • Structure with 17 named fields • Specific content rules for each field (software validation) • Combines data, metadata, QA/QC, processing history

  5. Storage Standard GCE Data Structure Specification (v1.1)

  6. Software – GCE Data Toolbox • Core Function Library • Create, Validate Structures • Import Data, Metadata (ASCII, MATLAB, SQL) • Manipulate Data, Metadata (unit conversions, add/delete/update) • Export Data, Metadata (various formats) • Dynamic, Rule-base QA/QC Flagging • Self-documenting Processing • Operation Logging (Processing History) • Transparent Metadata Creation/Updating • Dynamic (JIT) Metadata Generation for Columns • Support for Metadata “Templating” • Application of Boilerplate Metadata based on Parameter Matching • Supports Rapid Documentation of Routine Data Sources

  7. Software – GCE Data Toolbox • Support for Analysis • Descriptive Statistics, Reports • Visualization, Mapping • Support for Synthesis • Composite Data Set Creation • Multiple Data Set Merge/Concatenation • Relational Join • Metadata Content Meshing • Data Set Summarization • Statistical Data Reduction/Re-sampling • Data Set Standardization • Unit Conversions (automatic, interactive) • Template-based Semantic Mapping • Automatic Semantic Mediation (prototype stage)

  8. Software – User Interfaces • Unattended Batch Mode Processing • Interactive Command Line Processing (conventional MATLAB UI) • Full help text for each function • Well-defined input/output arguments • GUI Applications • Standard Forms, Dialogs, Controls • No MATLAB Experience Required • WWW – MATLAB Web Server • HTML Forms, Querystring Input • HTML Pages and/or Static File Output

  9. Command-Line Interface

  10. GUI Applications

  11. WWW Interface

  12. Current Applications • Automated Data Processing • Direct data import from data logger files, WWW data sources (USGS), SQL queries • Automatic metadata creation (templates, data mining) • Rule-based QA/QC flagging • Data Set Packaging • Batch processing to create/update data, metadata products • On-demand generation of data, metadata, stat reports in custom formats (end-user scripts, GUI applications, WWW forms)

  13. Current Applications • Data Exploration/Analysis by PIs • Descriptive Statistics based on attribute metadata • Visualization with Interactive Filtering (Frequency Histograms, 2D Plots, Map Plots) • Data Reduction/Re-sampling to Provide Customized Data at Various “Scales” • Aggregated Statistics • Binned Statistics • Query/Filtering (sub-selection)

  14. Current Applications • Data Harvesting (GCE) • USGS Data (WWW real-time, daily, finalized data) • Campbell Scientific Data Arrays (post-processing triggered after LoggerNet Retrieval) • Sea-Bird Hydrographic Data • USGS Data Harvesting Service for HydroDB • Weekly harvest for 31 stations/7 LTER Sites • Automatic Resampling, Unit Conversions, Q/C

  15. Availability • Description, Screen-shots, Fully-functional Toolbox Available on WWW: http://gce-lter.marsci.uga.edu/lter/research/tools/data_toolbox.htm • Requires MATLAB 5.3, 6.0, 6.5 (any platform) • “Public” Version Compiled • Source Code Requests Considered on Case-by-Case Basis

  16. Future Development Plans • EML 2.0 Support • Metadata-mediated Data Set Integration • Unit conversions • Re-sampling • More WWW Interface Development

More Related