The Process of Data Ingestion in ÆKOS

The Process of Data Ingestion in ÆKOS Andrew Graham and Matt Schneider TERN Ecoinformatics Data Analysts Logos used with consent. Content of this presentation except logos is released under TERN Attribution Licence Data Licence v1.0

Introduction The Data Analyst Role with TERN Ecoinformatics • Analysis of source data and methods • ÆKOS system development and domain modelling • Contextual description of the data • Publication of data into ÆKOS

The AEKOS Framework • Upper Context: Party, Project, Scope etc • Domain Model (Ontology): Observed entities, their features and relationships • Description Model: Methods and definitions • Indexing Model: Search and federation

Upper Context • Provides context for Datasets: • Contact details • High level objectives of program • Licensing details and conditions of use • Statement of scope • Alignment with national metadata standards (ANDS) • Statement of curation processes applied to data

Understanding Field Sampling Schematic view of sampling configuration

Methodological work-flow Study Location Selection Landscape Assessment Soil Assessment Fire Evidence Study Location Visit Surface Cover Physical Assessment Disturbance Evidence Vertebrate Evidence Climate Evidence Study Location Establishment Voucher Collection Sampling Unit Selection Species Assessment Species Life Stage Vegetation Assessment Vegetation Assemblage Canopy Age-class Canopy Assessment Structural Formation Overstorey Measurement

Authored Method Descriptions • Start with published method manuals • Enrich existing method descriptions (protocols) with external web links and other resources • Clarify questions about methods • Divide the protocol into smaller method descriptions

Authored Method Descriptions • Use a consistent format across datasets to allow comparison • Direct linkage between the data value and the specific method of measurement • Allows rapid assessment of suitability of data for re-use • Eventually a method catalogue for researchers

Definition of source datasets Analysis and definition of source data types: • Observation data • Taxonomic concepts (a specific type of ref. data) • Reference data (i.e. Lookup tables) • Images and other artefacts.

Mapping to the ÆKOS Domain Model Study Location mudmap comment visit date Study Location Visit datum observers Spatial Point x coord disturbance y coord selects identifier represents Sampling Unit marker type contains field identity contains Species Organism Group life form cover/abundance slope Landscape life stage aspect represented by phenology landform pattern dominance determined identity Voucher Specimen accession No. determiner

Indexing • Enrichment of data with common indexes: • Project level traits • Data management traits • Ecological process traits (disturbance and land-use) • Measurement details • Species taxonomy • Vegetation Assemblage (e.g. NVIS Major Veg. Groups) • Jurisdictional and Bio-geographic boundaries • Spatially derived features (e.g. distance from road, slope, aspect, etc.)

Federated Taxonomy

The AEKOS Ingestion “DSL” • Screen cap of Eclipse... • Source data query • Vocabulary management • Method description • Mapping to the common model • Populate indexes • Upper context authoring • Sandbox testing

Data Work-flow • Point of truth is always the source database • Data values are not changed • Data issues fed back to Data Providers • Automatic data refresh mechanism developed • Corrections made in source database and fed back to AEKOS on next “push” • Just new records and edits after the first load • Update frequency defined for each dataset

Quality Assurance • ÆKOSQA and review: • Team review domain modelling of every dataset ingested • “Sandbox” test ingestion before publishing to ÆKOS • Review of method description by other team members • Internal code validation and error checking

Quality Assurance • Data Providers QA: • Review method descriptions • Review upper context • Portal feedback: • Review data content in the portal • Use the portal and suggest enhancements and changes • Look and feel • Index traits • Data accuracy and representation • Feedback survey and email facility on portal

Thank you • Contact Details • Data Analyst – Matt Schneider matt.schneider@adelaide.edu.au • Data Analyst – Andrew Graham andrew.graham@adelaide.edu.au • Website www.aekos.org.au

The Process of Data Ingestion in ÆKOS

The Process of Data Ingestion in ÆKOS

Presentation Transcript

Health Implications of Perchlorate Ingestion

Health Implications of Perchlorate Ingestion

KONSEP KOS

Multithreaded ingestion of BUFR messages from the IDD

PROCESS IN DATA SYSTEMS

The Process of Data Mining

Foreign Body Ingestion

Slant TEC data ingestion in the Modified NeQuick ionosphere electron density model

Exploratory Data Analysis (EDA) in the data analysis process

A tentative typology of KOS: towards a KOS of KOS?

KOS

ACCIDENTAL INGESTION OF MEDICATION IN CHILDREN

KOS

KOS

Process of Data entry in Budget Proformae

The Number of Inquiries about Battery Ingestion in Japan

Biochemical Transfer of Learning via Ingestion in Planaria

Management of Corrosive Ingestion

Data and ISIS Ingestion in ArcGIS 101 Tutorial

Blaž Kos, kos@poslovniangeli.si Beograd, 2010

Data Integration in the Process Industry