1 / 43

OOINet Metadata Management

OOINet Metadata Management. 4 /28/2014. Outline. Metadata Introduction & Model Description Example Metadata Creation Metadata Management Metadata Usage Externalization. MetaData Introduction & Model Description. Metadata in the Context of OOI.

junior
Download Presentation

OOINet Metadata Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OOINet Metadata Management 4/28/2014

  2. Outline • MetadataIntroduction & Model Description • Example • Metadata Creation • Metadata Management • Metadata Usage • Externalization

  3. MetaData Introduction & Model Description

  4. Metadata in the Context of OOI • Metadata in OOI is used in two primary ways, “User Relevant” and “Internal” • “User Relevant” Metadata for Science/Engineering Data and Marine Assets: • Geospatial extent, temporal extent, coordinate system • Names of parameters, units • License, contact, provenance information • Type defintion, name, OOINetunique id • “Internal” Metadata for OOI functionality • Lifecycle state • Type specific attributes • Cross-references (aka associations) • Presented to end users via a Web User Interface • Available for export and download via integrated tools This presentation will focus on “User Relevant” Metadata

  5. OOI Metadata Model • OOINet manages 22 user relevant types • Examples: instrument (device), instrument site, deployment, data product • Metadata Attributes • Attributes are defined in human readable configuration files (using YAML files) • Common “base types” exist to share common attributes • Utilizing the “Object Oriented” design of inheritance • Storage/Search • Metadata is persisted in the Postgres database and can be queried flexibly • Maintainability • Metadata model can evolve without the need for a database schema change • A system point release is required for a change

  6. Abstract “Base” Types • Abstract Base Type • “Top of the tree” • Broad definitions that are extended into a Metadata “Instance” • This “Object Oriented” methodology allows different Metadata types to share attributes and functionality

  7. User Relevant Metadata

  8. Attribute Example Common “Base” Attributes Example “Extended” Attributes

  9. Object Types User relevant Object Types and example metadata attributes. Objects are re-usable sets that can connect related attributes, or for sharing across different types

  10. Metadata Example: Facility -> Deployment Meta Data Introduction & Model Description

  11. Metadata Instances: Facilities and Sites • A Facility is an organizational unit • CGSN • RSN • EA • Sites are named deployment locations • Geographical • Platform • Instrument

  12. Metadata Instances : Devices and Models • A Model is a type/class/series of instrument or platform • A Device is a physical instrument or platform asset with serial and property number

  13. Metadata Instances : Data Products • A Data Product describes data from a device • Science, engineering, raw, derived data • Contains instructions for computing calibrations and automated QC • Points to actual data content (does not duplicate data) and DPA (algorithms)

  14. Metadata Instances : Deployments • A Deployment describes one assignment of a device (assembly) to a site for a period of time • A Site Data Product(R3) references data from all devices at that site over time

  15. Metadata Instances: Other usages • Many other metadata types exist, such as: • User, identity, role, authorization (commitment) • Parameter definition, parameter set, parameter function • Data process definition, data process (instance) • Other marine assets (anchor, cable, etc) • Other associations exist • User membership in facility • User role authorization in a facility

  16. Metadata Attributes • An analysis of relevant metadata attributes was performed for user relevant types • Analysis of OGC, ISO standards and community standards • Analysis of similar observatory projects • Collaboration with MBARI designers • The OOINet Metadata Model is compliant with • OGC standards: CSW, SOS and others • CF Conventions • Attribute Conventions for Dataset Discovery (ACDD) • ISO standards: ISO 19115 • Metadata Attribute Definition • The original attribute definition followed a process that included stakeholders from the Program office, Marine IOs, User Experience (UI), CI Data Curator, System Architecture • Presented and reviewed at lifecycle reviews (LCO, LCA)

  17. MetaData Creation

  18. Creation of OOINet Metadata Instances • For an OOINet Release - via operator program (“preload”) • As a convenience to provide a “turnkey solution” • Intended to provide comprehensive structure of consistently named assets with metadata filled in as available on initial load • Fine-tuning required by end users via manual edit • Metadata will evolve over time • During operations, via operatorprograms • Incremental preload for new instruments and subsequent deployments • Removes the burden of manual entry • Manually via Web User Interface

  19. Dynamic Metadata Generation • Some information is computed on demand • E.g., for the Web UI, respecting caller’s authorization level • Computed attributes (“additional dynamic info for a data request”) • Current geographical bounding box • Current agent state • Last ingested instrument measurement and timestamp • Number of active instruments • Most recent events for a device • Extended information (“additional info about the surroundings of a data request”) • Model metadata for a device (name, link) • Instance configuration for a device • Owner, user, and facility for an asset • Standing subscriptions for an asset

  20. MetaData Management

  21. OOINet Metadata Management • Initial asset creation via preload • User can edit metadata and associations via the UI • User can create object instances (if authorized) • Operators can execute scripts to perform incremental loads or changes • Incremental preload for new sites • Incremental preload for new instrument models and drivers • Set calibration coefficients for devices • Set dataset agent configuration

  22. OOINet Metadata Model Evolution • The basic framework enables flexible extension • Add new asset types • Add new new Association types • Add new asset attributes and make changes • Add content validation rules to asset attributes • Metadata Changes • Changes in metadata can be made with no need to change the underlying database schema • Release 3 Functionality • Observatory asset tracking will enable users to add asset specific metadata and asset specific metadata attributes directly through the UI

  23. OOINet Metadata Navigation • Starting point: dashboard and map view • Navigate via OOI geographical site (array) • Data products • Assets (instruments, platforms, sites) • Navigate via OOI facility (organization) • “Facepage” for Metadata Instance • Shows: Name, description, owner, lifecycle state • Approximate geospatial/temporal range • Recent events for the asset • Multiple information levels to limit overload • Navigate to related facepages, control page, etc.

  24. Metadata Storage • OOINet stores metadata in a relational database: PostgreSQL 9.3 • State of the art open source database • Transactional with integrity constraints • High performance • Can be clustered (to address processor overloading) • With geospatial extensions (PostGIS) • Advanced query capabilities • OOINet manages the logical information model, rather than the database • One table for all assets • Also stored: event metadata

  25. MetaData Usage

  26. Metadata Usage: OOINet Data Search • The OOINet services provide mechanisms to search observatory and science data using Metadata • By metadata name • By asset type • By lifecycle state • By context (within a site, part of a facility) • By overlap with geospatial or temporal range • By metadata create/last change date • Any combination of the above (AND, OR) • Or: Simple search via search text field • The Web UI implements a choice of search options for the user

  27. Metadata Usage: Asset Tracking • Release 2 implemented asset tracking for operational assets (instruments, devices, deployments to sites) with basic user interfaces • Life-cycle management: planned, ready, deployed, retired • Comprehensive asset metadata model • Release 3 enables tracking of ancillary assets (cables, anchors) • More sophisticated import/export/sync mechanisms with Marine IO processes and tools • User-defined asset attributes and asset types • Enhanced User Interfaces

  28. Metadata Access: OOINetFacepages • “Facepage”, e.g. Data Product Facepage with metadata attributes • Table of parameters with type and units • Computed geospatial bounding box, temporal range • Links to related Data Products • (In R3) Provenance information computed based on modifications to data product and source • Some metadata editable if logged in with proper credentials • Also provides access to science data

  29. Science Data and Metadata Access: OOINetFacepages

  30. Metadata Catalog Externalization: GeoServer • Provides access to OGC compliant metadata • Web catalog • Permanent URLs accessible to external tools • CF Conventions • Supports OGC interoperability standards: • WFS (1.0.0, 1.1.0, 2.0.0) - Web Feature Service • WMS (1.1.0, 1.3.0) - Web Map Service • SOS (1.0.0) - Sensor Observation Service • CSW (2.0.2) - Catalogue Service • and others

  31. GeoServer (Release 3 Preview)

  32. Back Up Information • The following slides contain further details on the metadata architecture and implementation

  33. Information Model: Agents • An Agent Definition is a type of agent/driver • Platform/instrument/dataset variants • Every model has its own agent definition • An Agent Instance has configuration for the agent of one device

  34. Information Model: Platform Assemblies • Platforms assemble multiple instruments • One Deployment per assembly • Many Data Products, many agents

  35. Science Data Management: Structure • Science and engineering datasets are determined by • Model of platform/instrument device • List or parameters • Level of derivation (Raw, combined, L0, L1, L2) • Applicable persisted input per dataset • Calibration coefficients (by model) • QC lookup values (e.g. for range test) • Derived parameters • Calibrated values (as spec’ed in L1 DPS) • Automated QC flags • Referenced values (e.g. GPS location from platform) • Derived values (as spec’ed in L2 DPS) • Site data products reference input from multiple devices for a site across many deployments

  36. Science Data Management: Basics • Science Data is stored as files in internal format (HSF5) in a geographically redundant Storage Area Network (SAN) • OOINet manages the ingestion of science data granules into science and engineering data sets • OOINet manages indexes of current geospatial and temporal extents of available data and updates metadata • OOINet computes calibrated and derived data products and performs automated QC • OOINet provides an internal API to efficiently query and retrieve data of interest (“coverage model”) • OOINet integrates tools (ERDDAP, DAP) for users to download data in various download formats

  37. Details: OOINet Preload Program • Information sources • OOI SAF Engineering Tool (via CSV export) • Can be applied incrementally for the next set of deployments • Includes: site, model, geospatial, data product spec, QC • Maintained by OOI system engineering • Configuration controlled online spreadsheets • For facilities, known users and user role assignments • For science data definitions: parameters, units • For instrument/dataset agent definitions • Behavior • Generates systematic names of assets • Sets metadata known from SAF (e.g. geospatial, make/model, reference designator) • Cross-references assets, e.g. instruments, data products, models, agent definitions etc.

  38. ERDDAP in OOINet Release 2

  39. Science Data Download: ERDDAP • Access to science and engineering data • Map plotting • Selection of geospatial and temporal range of interest • Selection of parameters of interest • Download formats include NetCDF, CSV, Matlab, HTML, PNG (map plot), PDF • Download URL permanently accessible from external tools, e.g. Matlab using DAP protocol • Download timestamps, measurements, calibration coefficients, QC flags, calibrated and derived parameters

  40. OOINet Data Ingestion From Iridium downlink or recovery • Cyber Pop Post Recovery • OOINet Core pyDAP Postgres ERDDAPServer Postgres Glider Platform Separate Data Files Glider • Temp • Press. • Lat./Lon • Etc. Files : Disk Storage Hard Disk Storage Meta-Data data data data OMC OOINet Ingestion Algorithms UI Web Server MIO File Server Dataset AgentDriver DatasetAgent Instrument data is now stored in separate files on a hard disk, and Metadata is stored in the Postgres Database Local Storage rsync Local Storage CI File Server

  41. Data Acquisition Details MIO File Server • L0 Data at OMC via Iridium Uplink or Disk Transfer • Resides on Marine IO (MIO) Hardware (file server) • Glider data files contain readings for various sensors in one file (special case due to vendor provided driver software) • Other Instruments provide readings in a series of files • File server initiates “rsync” utility, which mirrors the data files up to the CyberPops • L0 Data on a CyberPop • OOI is monitoring file server, and activates a Dataset Agent with a Dataset Driver to harvest and parse the data • L0 Data on a CyberPop, within OOINet • Data is parsed by the driver into rows of a “table”, where after the Dataset Agent packetizes additions to the table into data “granules” • Granules are streamed from the agent to the OOINet Ingestion using RabbitMQ (an implementation of the AMQP protocol) CI File Server data OOINET CORE Files : Disk Storage

  42. Data Ingestion: OOINet • OOINet (Core services) • Consists of services and Data Processes running on Virtual Machines (VMs) • OOINet Ingestion is configured to listen for incoming data streams from Dataset agents and route them to disk storage • This is where the metadata describing a data stream is routed into the Postgres Database • OOINet can also route data streams to users, issue read requests on behalf of users requests, invoke algorithms, listen for ERDDAP->pydap requests, etc. • File Storage • Data from instruments are saved into a set of files consisting of a “Master” file and multiple “child” files. The master file is used to access the desired data records • Metadata about datasets is stored in the Postgres Database • A request for data from a user or program will look up data records in the Postgres DB, and then pull records from the disk files UI Server OOINET CORE Files : Disk Storage Postgress

  43. OOINet Data Service to Users • Cyber Pop • User Space ERDDAP Website (Browser) • OOINET pyDAP Postgres ERDDAPServer Files : Disk Storage Algorithms user req. user data Meta-Data Meta-Data user req. User Request OOINet Website (Browser) OOINet Services UI Web Server Drivers Agents Local Storage CI File Server

More Related