120 likes | 204 Views
Explore issues and solutions encountered during data aggregation and parameter semantic mapping in oceanography, offering insights into overcoming scalability challenges and improving contaminant discovery.
E N D
EMODNET Chemistry 2Semantic Suggestions Roy Lowry and Adam Leadbetter British Oceanographic Data Centre
Semantic Issues • Parameter semantic issues encountered during the pilot • Naming of the aggregated products • Inability to aggregate across multiple P01 codes • Difficulty mapping local parameter vocabularies to P01 • P01 scalability issues • Inability to discover a specified contaminant
Aggregation Naming • Problem • During the pilot a lot of (circular) e-mail traffic concerned the labelling of aggregated parameters • Solution • Naming needs to be governed • Governance decisions need to be implemented as a controlled vocabulary
P01 Aggregation Issues • Problem • Aggregation tools create an aggregated parameter for every P01 code in the source dataset • Different P01 codes used for parameters that are not significantly different (or even not different at all) • Fixes for this (retagging source data or merging channels in the aggregation tool) is both labour intensive and error prone
P01 Aggregation Issues • Solution • Define each aggregation as a set of P01 codes • Store and serve resultant mapping in the NERC Vocabulary Server • Update aggregation tools to access mapping and use it to dynamically merge channels with different P01 codes
P01 Mapping Difficulties • Problem • There’s a lot (>28000) of codes in P01 • Finding the code needed for a given local parameter vocabulary term seems to cause a lot of difficulty • Text generated from a semantic model isn’t always intuitive (e.g. [dissolved plus reactive particulate phase] = ‘unfiltered’)
P01 Mapping Difficulties • Solutions • Mapping based the semantic model (matrix, substance, taxon, gender, organ) rather than the preferred label text • Improvements to the search algorithm in the client (e.g. Addition of ‘excluding’ clause) • Exposure of P01 subsets through NVS2 concept schemes (thesauri) • Training in how to map
P01 Scalability Issues • Problem • Many contaminants in many different biological entities = a number of P01 codes that is predicted to be unmanageable • Solution (not favoured) • Redesign formats to use discrete semantic model not P01 code • Different formats for different data types • Moves complexity from semantic domain into the data files
P01 Scalability Issues • Solution (preferred) • Retain P01 as a register of semantic element combinations • Automate concept registration (part of a semantic model-based mapping tool perhaps) • Use NVS V2 concept schemes to expose P01 subsets to make navigation easier
Contaminant Discovery Issues • Problem • Parameter discovery (CDI interface) is based on P02 • P02 groups contaminants with variable granularity • Good for PCBs • Not so good for ‘other organic contaminants’ • A search for datasets with cadmium in Mytilusedulis flesh isn’t possible • The nearest is metals in biota, which will give many unwanted hits
Contaminant Discovery Issues • Possible Solution • Mine the P01 codes in the SeaDataNet file stock into the CDI metadatabase • Use these for drill-down parameter discovery in the CDI search engine
Taking This Forward • Some of the solutions presented are ODIP pilot candidates • Specifications of these are currently vague • Not absolutely clear who should be doing what and when • Meeting (Liverpool or London if easier) to develop the specifications and an implementation roadmap