1 / 26

Introduction to SeaDataNet Metadata

SeaDataNet Training Course. Introduction to SeaDataNet Metadata. Roy Lowry British Oceanographic Data Centre. Overview. An introduction to the SeaDataNet metadata formats covering Purpose Entity definition History Population Strengths Weaknesses. Overview. SeaDataNet metadata formats

phila
Download Presentation

Introduction to SeaDataNet Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SeaDataNet Training Course Introduction to SeaDataNet Metadata Roy Lowry British Oceanographic Data Centre

  2. Overview • An introduction to the SeaDataNet metadata formats covering • Purpose • Entity definition • History • Population • Strengths • Weaknesses

  3. Overview • SeaDataNet metadata formats • European Directory of Marine Organisations (EDMO) • Cruise Summary Report (formerly ROSCOP) • European Directory of Marine Environmental Datasets (EDMED) • European Directory of the Ocean Observing System (EDIOS) • SeaDataNet Common Data Index (CDI) • European Directory of Marine Environmental Research Projects (EDMERP)

  4. EDMO • Purpose • Provides SeaDataNet with an address book of organisations associated with marine data • Provides descriptions of these organisations • Entity definition • Any group of people sharing a common postal address engaged in activities associated with marine data acquisition and use • History • Developed by Maris during SEA-SEARCH in response to a need to improve address metadata management across the project

  5. EDMO • Population • On-line Content Management System fronted by a web form (http://www.sea-search.net/organisations/) • Partners are responsible for maintenance of their national record set • Management supported by a reasonably sophisticated access control system that authenticates users and grants access to the appropriate database subset

  6. EDMO • Strengths • The maintenance tool. Please use it to look after the entries for your country • Provides a single point of entry for SeaDataNet metadata documents associated with a given organisation • Centralisation of metadata common to other catalogues, replacing four independently maintained address metadata repositories • Rich information content, including descriptions, logos and spatial location information

  7. EDMO • Weaknesses • Simple data model is poorly equipped for the management of organisational evolution • Organisations merge, fragment, rename and move • All we can do in EDMO is document this using plain language fields • Text fields contain embedded markup • These look very nice when displayed through the search interface • However, the markup causes problems generating XML documents for record transport between systems • Examples including graphics and relative URLs break when transported by copy/paste

  8. CSR • Purpose • To document the operational and data generation activities of an oceanographic research cruise • Entity definition • A subject of some controversy • I am a metadata purist and support the definition of a ‘cruise’ as the interval of time between leaving port and returning to port • Thus for a 3-leg cruise I would generate 3 CSR records whilst others would generate just one. I do this because: • Combining records is easier than splitting them • Cruise ‘legs’ for some ships can be VERY different (e.g. 3 legs of a Meteor cruise: one JGOFS, one OMEX, one WOCE) • Merging ‘legs’ is a slippery slope – I’ve even encountered a single record covering the activities of two ships three months apart

  9. CSR • Entity definition (continued) • Problem with my definition is that the real world creates grey areas. For example, does a personnel change by pilot boat in an estuary count as ‘docking’? • Others, extend the definition to cover any activity collecting oceanographic data (shoehorning) • I believe this is a very bad thing to do • The activity super-class and other activity sub-classes are much better described by other metadata standards (e.g. in OGC Observations and Measurements) • Later on in SeaDataNet we could consider incorporating some of these to further enrich our metadata portfolio • In the meantime remember that it is NOT necessary to have every measurement covered by a CSR. If it isn’t appropriate, don’t create one.

  10. CSR • History • Originally a paper form developed by IOC called a ROSCOP • Replaced in 1990 by the Cruise Summary Report with richer content (but the name ROSCOP stuck) • Numerous on-line databases developed during the 1990s • Primary repositories now DOD for SeaDataNet partners and ICES for non-SeaDataNet

  11. CSR • Population • On-line web-form (http://www.sea-search.net/roscop/welcome.html) • XML schema available for bulk transfers • Strengths • Flexible population mechanisms • Long history with a massive legacy population • Cruise is (or should be) a well defined concept to oceanographers

  12. CSR • Weaknesses • “Parameter” vocabulary • Really a vocabulary describing shipborne activities • No clear equivalent elsewhere for interoperability, but ontological mapping to multiple vocabularies might provide a solution • On-line systems developed using plaintext fields when controlled vocabularies would have made interoperability between repositories more straightforward • Spatial coverage limitations • Coarse-grained • Described using Marsden Squares but BODC has deployed a Web Service to convert these to ISO19115/DIF standard bounding boxes

  13. EDMED • Purpose • To describe marine environmental datasets to promote their discovery • Entity definition • A dataset, but what is a dataset? • ISO19101 defines a dataset as ‘an identifiable collection of data’ which covers everything from the parameters measured on a single water sample to the 7,500,000 CTDs is the USNODC World Ocean Database • Sound judgement is needed to decide upon appropriate granularity • Best approach is to establish objective criteria • Worth remembering that a measurement may be included in more than one dataset • Posing this question to metadata specialists can provide good sport!

  14. EDMED • History • Developed by BODC in late 80s • Adopted by EU MAST Data Committee, then SEA-SEARCH and now SeaDataNet • Population • Form interface to stand-alone Access database that is submitted to BODC for ingestion • XML schema available for bulk transfers • Strengths • Content quality controlled on ingestion, therefore standards are high • Rich content developed during SEA-SEARCH

  15. EDMED • Weaknesses • Developed in splendid isolation, including vocabularies, therefore interoperability with other systems is difficult • Heavy dependence on plaintext fields: a problem that should be addressed during SeaDataNet

  16. EDIOS • Purpose • To describe marine environmental datasets comprising data that are collected repeatedly, regularly and routinely in order to promote their discovery (initially for operational planning purposes) • Entity definition • A dataset comprised of data that are collected repeatedly, regularly and routinely, but what is a dataset (c.f. EDMED)? • History • Developed as an EU project led by EuroGOOS • Inherited by SeaDataNet

  17. EDIOS • Population • Currently an issue • There is a Word-based form (the MIF) • Developed in parallel to the data model and database with no evidence of communication • Completed MIFs entered into the database at BODC, requiring significant interpretation and information rehashing (long and painful process) • SeaDataNet work in progress • IFREMER/BODC working to produce an XML schema to facilitate large-scale transfer • Maris/BODC developing a web-form based content management system along the lines of EDMO

  18. EDIOS • Strengths • Rich data model based on structured fields with minimal plaintext • Data model includes hierarchical relationships between entities (project one-to-many observing programmes one-to-many measurement series) • Data model includes support for complex spatial objects (polygons not boxes) • Data model is particularly well suited to the description of operational oceanographic systems

  19. EDIOS • Weaknesses • At the start of SeaDataNet EDIOS had 17 local vocabularies • Extremely poor content governance • Undergoing replacement with managed SeaDataNet standard vocabularies (6 down 11 to go) • Legacy content has not been systematically quality controlled

  20. EDIOS • How is EDIOS different from EDMED? • Both are content standards designed to describe datasets • Any dataset described by an EDMED document could be described by an EDIOS document and vice versa • Once vocabularies have been harmonised and some mappings set up it should be possible to generate an EDMED document from an EDIOS document • Generation of an EDIOS document from an EDMED document will never be possible

  21. EDIOS • How is EDIOS different from EDMED? • SeaDataNet convention is to use EDIOS for ‘qualifying’ datasets and EDMED for everything else • EDMED currently has a working population mechanism, but EDIOS does not • Advice to partners • Identify datasets to be described by EDIOS documents, map them to the EDIOS data model (relational schema and Access prototype on BSCW) and gather together the necessary information • Prepare EDMED documents for all other data sets and get them into BODC • Submit EDIOS entries to BODC once the necessary systems are operational

  22. CDI • Purpose • To provide an ultra-light discovery metadata description of accessible SeaDataNet data objects • Used to build a manageable fine-grained index of discrete data objects (millions of entries) • Entity definition • The fundamental SeaDataNet data delivery unit such as a current meter record or a CTD profile • History • Developed by SEA-SEARCH as a pilot for SeaDataNet

  23. CDI • Population • XML schema describing files that should be generated automatically from existing digital indexes • Strengths • Light content makes efficient handling of large numbers of records possible • Weaknesses • Light content restricts available information

  24. EDMERP • Purpose • Description of European marine research projects and programmes • Entity definition • A co-ordinated collection of marine data acquisition activities in Europe • History • Developed by Maris during SEA-SEARCH

  25. EDMERP • Population • Access form: resulting mdb file submitted to Maris • On-line content management system planned • Strengths • Provides centralised project metadata • Weaknesses • Local vocabularies and plaintext

  26. That’s All Folks! Questions or Geoff?

More Related