1 / 31

Experience with the WMO core metadata in the SIMDAT/VGISC project

Experience with the WMO core metadata in the SIMDAT/VGISC project. Baudouin Raoult ECMWF. The SIMDAT/VGISC project. SIMDAT EU funded GRID project 7 Technologies: Grid infrastructure, Virtual Organisation, Ontologies, Analysis Services, Workflows, Distributed data access, Knowledge Services

hstein
Download Presentation

Experience with the WMO core metadata in the SIMDAT/VGISC project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience with the WMO core metadata in the SIMDAT/VGISC project Baudouin Raoult ECMWF

  2. The SIMDAT/VGISC project • SIMDAT • EU funded GRID project • 7 Technologies: Grid infrastructure, Virtual Organisation, Ontologies, Analysis Services, Workflows, Distributed data access, Knowledge Services • 4 Activities: Automotive, Areospace, Pharmacy and Meteorology • Meteorology activity: build a Virtual GISC (V-GISC) • DWD • UKMO • MétéoFrance • EUMETSAT • ECMWF

  3. V-GISC infrastructure

  4. V-GISC Conceptual view • Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization • The Virtual Database Service provides a single view of partners databases

  5. VGISC Distributed Architecture

  6. Why do we need metadata (in this project)? • Create a catalogue (discovery metadata) • Searchable (Keyword, Geographical location, Time range) • Browsable (Directory hierarchy) • Implement the V-GISC (service metadata) • Describe where the data resides (physical location) • Describe how to request the data • Describe the data format (useful for offering list of transformations, e.g. sub-sampling of gridded data, plots or format conversions) • Describe associated data policies

  7. Study of the WMO core • Starting point • XML files available on the WMO web site • XML files from DWD earlier prototype • Trying to describe ECMWF archive (1.3 1010 GRIB fields)

  8. XML Root element <p:piTimeseriesxmlns:p="http://www.wmo.ch/web/www/metadata/piTimeseries"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns="http://www.wmo.ch/web/www/metadata"xsi:schemaLocation="http://www.wmo.ch/web/www/metadata http://www.dwd.de/UNIDART/metadata/WMO19115_metadata_v0_2.xsd http://www.wmo.ch/web/www/metadata/piTimeseries http://www.dwd.de/UNIDART/metadata/WMO19115_piTimeseries_schema.xsd"> or <metaDataxmlns="http://www.wmo.ch/web/www/metadata"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“xmlns:fc="http://www.wmo.ch/web/www/featurecatalogue“xsi:schemaLocation="http://www.wmo.ch/web/www/metadata/../WMO19115_metadata_v0_2.xsd http://www.wmo.ch/web/www/featurecatalogue/./featurecat/iso19110.xsd"> • Namespaces are a nightmare to use (especially using XPath when there is a default namespace)

  9. XML Keywords <descriptiveKeywords>Russian Federation</descriptiveKeywords> <descriptiveKeywords>Moscow region</descriptiveKeywords> <descriptiveKeywords>Temperature</descriptiveKeywords> <descriptiveKeywords>Clouds</descriptiveKeywords> <descriptiveKeywords>Meteorology</descriptiveKeywords> <descriptiveKeywords>Observation</descriptiveKeywords> <descriptiveKeywords>Pressure</descriptiveKeywords> <descriptiveKeywords>Rainfall</descriptiveKeywords> <descriptiveKeywords>Snow</descriptiveKeywords> <descriptiveKeywords>Snowfall</descriptiveKeywords> <descriptiveKeywords>Weather</descriptiveKeywords> <descriptiveKeywords>Wind</descriptiveKeywords> <descriptiveKeywords>Phenomenon</descriptiveKeywords> Or… <descriptiveKeywords>EARTH SCIENCE > Cryosphere > Sea Ice</descriptiveKeywords> <descriptiveKeywords>EARTH SCIENCE > Atmosphere</descriptiveKeywords> <descriptiveKeywords>EARTH SCIENCE > Oceans</descriptiveKeywords> <descriptiveKeywords>EARTH SCIENCE > Solid Earth</descriptiveKeywords> <descriptiveKeywords>ocean, atmosphere, ice, land</descriptiveKeywords> Or… <descriptiveKeywords>METAR aviation hourly weather observation temperature dew point precipitation amount visibility cloud amount type height weather runway colour state</descriptiveKeywords>

  10. XML Geographical extent <geographicElement> <polygon> <point> <latitude>50.78</latitude> <longitude>6.1</longitude> </point> </polygon> </geographicElement> Or… <geographicElement> <geographicIdentifiergazetteer="http://www.wmo.ch/web/www/ois/volume-a/vola-home.htm"> CCCC2 </geographicIdentifier> </geographicElement> Or… <geographicElement> <boundingBox> <westBoundLongitude>-126.3</westBoundLongitude> <eastBoundLongitude>-126.3</eastBoundLongitude> <southBoundLatitude>39.9</southBoundLatitude> <northBoundLatitude>39.9</northBoundLatitude> </boundingBox> </geographicElement>

  11. XML Temporal extent <temporalElement> <beginDateTime>0100-01-01</beginDateTime> <endDateTime>0299-12-31</endDateTime> <dataFrequency>monthly</dataFrequency> <dataFrequency>daily</dataFrequency> </temporalElement> Or… <temporalElement> <referenceDateTime>2004-02-05T00:00:00</referenceDateTime> <beginDateTime>2004-02-05T06:00:00</beginDateTime> <endDateTime>2004-02-05T06:00:00</endDateTime> </temporalElement> Or… <referenceDate> <date>2004-01-28</date> <dateType>creationDate</dateType> </referenceDate>

  12. Repetition of XML elements (means extension) <dataExtent> <verticalElement> <minimumValue>3.5</minimumValue> <maximumValue>992.5</maximumValue> <unitOfMeasure>mb</unitOfMeasure> </verticalElement> </dataExtent> <dataExtent> <geographicElement> <boundingBox> <westBoundLongitude>-180</westBoundLongitude> <eastBoundLongitude>+180</eastBoundLongitude> <southBoundLatitude>-90</southBoundLatitude> <northBoundLatitude>+90</northBoundLatitude> </boundingBox> <geographicIdentifiergazetteer="http://gcmd.gsfc.nasa.gov/Resources/valids/location.html">Global </geographicIdentifier> </geographicElement> </dataExtent> <dataExtent> <temporalElement> <beginDateTime>1900-01-01</beginDateTime> <endDateTime>1999-12-31</endDateTime> <dataFrequency>monthly</dataFrequency> <dataFrequency>daily</dataFrequency> </temporalElement> </dataExtent>

  13. Repetition of XML elements (means redefinition) <dataExtent> <description>Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector S</description> <geographicElement> <boundingBox> <westBoundLongitude>-180</westBoundLongitude> <eastBoundLongitude>-60</eastBoundLongitude> <southBoundLatitude>0</southBoundLatitude> <northBoundLatitude>90</northBoundLatitude> </boundingBox> </geographicElement> </dataExtent> <dataExtent> <description>Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector T</description> <geographicElement> <boundingBox> <westBoundLongitude>-60</westBoundLongitude> <eastBoundLongitude>60</eastBoundLongitude> <southBoundLatitude>0</southBoundLatitude> <northBoundLatitude>90</northBoundLatitude> </boundingBox> </geographicElement> </dataExtent>

  14. Findings • A flexible format, that leads to a lack of consistency • Different way to encode geographical extent, keywords and temporal extents • Missing information (for the V-GISC) • To create a directory • To locate the data • To create retrieval requests • To describe available transformations • To implement data policies

  15. Findings (cont.) • Seems to be designed for human consumption • Free text in XML elements • <distributionInfo> • <dataQualityInfo> • Not scalable • Some document may change frequently (hourly?) • Some documents are orders of magnitude larger than data itself • Cannot represent very large archives with small granularity

  16. SIMDAT/VGISC problem • Each site has its own practices • We have to be ready for variability in the XML • We will have to handle XML from other WMO programmes • We need to handle tens of thousands of documents • Lot of repeated information • We need fast search • We need to automatically • Index the keywords, the geographical extent and the temporal extent • Create a browsable directory (similar the NCAR’s Community data portal) • Locate and retrieve the data • Implement the data policy

  17. Core WMO Owner UKMO Data type Synop Station (geographical extent) Heathrow Date (temporal extent) 2005-10-12 Solution: split XML documents into fragments • WMO core metadata is structured • Some part are shared amongst many documents • All metadata share the Core part • All UKMO metadata share the Owner part • All synops (should) share the same description • All observations at Heathrow have the same location • The date part is variable but is very small

  18. XML fragments are hierarchically linked WMO UKMO Synop Heathrow Heathrow Synop Heathrow Synop 2005-10-12

  19. Fragments: advantages • Factorizing commonalities into static fragments • Reduces size of XML documents • Indexation done once • Avoid redundancy of information • Faster searches • Frequently updated documents are small • Manageable • Scalable • Complete XML document can be rebuilt • For exchange outside the V-GISC

  20. Keywords Geographical Extent Temporal Extent Indexing of XML fragments WMO UKMO Synop Heathrow Heathrow Synop Heathrow Synop 2005-10-12

  21. Prototype implementation • XML Fragment are stored as “text” • Fragment table • Hierarchy table • Indexed at insertion time • Keywords table • Locations table • Periods table • Directory table • Implemented with MySQL • With OpenGIS extension • With text search extension • Indexes are “inherited” • OO approach

  22. WMO UKMO Synop Heathrow Heathrow Synop Heathrow Synop 2005-10-12 Object Oriented Approach - Behaviours Index <descriptiveKeywords> as keyword Index <geographicElement><boundingBox> as geography Index <featureAttribute> <membrName> as keyword Index <referenceDate> <date> as period

  23. Fragment properties - Behaviours • Only the owner of the data knows how to : • Describe the data (Indexation information) • Request the data (Create internal request) • Extract a subset of the data (Define a interface to extract a subset) • Associated to each fragments ancillary metadata can be defined to describe how to index, request and sub-select the data • Behaviours are inherited • Object oriented approach

  24. Behaviours example: indexing <indexingclass="XPathKeywordIndexer“ separator=“ “> <xpath>//identificationInfo/descriptiveKeywords</xpath> </indexing> <indexingclass="XPathBoundingBoxIndexer"> <xpath>//identificationInfo/dataExtent/geographicElement/boundingBox</xpath> </indexing> <indexingclass="XPathPolygonIndexer"> <xpath>//identificationInfo/dataExtent/geographicElement/polygon</xpath> </indexing> <indexingclass="XPathDateIndexer"> <xpath>//identificationInfo/referenceDate/date</xpath> </indexing> <indexingclass="XPathPeriodIndexer"> <xpath>//identificationInfo/dataExtent/temporalElement</xpath> <xpath>//identificationInfo/referenceDate/period</xpath> </indexing> <indexingclass="XPathDirectoryIndexer"> <xpath>//identificationInfo/topicCategory</xpath> </indexing>

  25. <vgisc> extension • A <vgisc> element from the “http://www.vgisc.org/” namespace is embedded in all the fragments • It contains all information needed to implement the V-GISC that is not defined by the WMO core because they are not relevant outside the scope of the V-GISC • Internal unique ID • Hierarchy relationship • Physical location (which V-GISC node holds the data) • Information used to create data request • Information used to create web pages • It is removed when full XML document is recomposed for use outside the V-GISC

  26. Fragment example <metaData xmlns:v='http://www.vgisc.org/'> <v:vgisc> <id>urn:akrotiri.synop.land.second.record.20050629</id> <inherit>urn:akrotiri</inherit> <inherit>urn:int.wmo.synop.land.second.record</inherit> <location>ecmwf.obs</location> </v:vgisc> <identificationInfo> <referenceDate> <date>2005-06-29</date> </referenceDate> </identificationInfo> </metaData>

  27. Variables and Requests • Some datasets have two many items • Impossible to describe every one of them • But describing the whole dataset is simple • Some datasets are very homogenous • E.g. same parameters for a long period of time • This can be described in a compact form (<beginDateTime> and <endDateTime>) • But we still need to specify that individual dates can be requested by the user

  28. Variables and requests (cont.) • Associate two elements with an XML fragment: • <request> • Hold information specific on how to generate a valid request to the data repository • <variable> • Holds information on how to create a web interface to let the user select items from the dataset • Web portal • We use WMO core for discovery • We use the <variable> element to present selection dialogues to the user

  29. Fragment example: ECMWF Reanalysis <metadata xmlns:v='http://www.vgisc.org/'> <v:vgisc> <id>urn:int.ecmwf.era40.sfc</id> <inherit>urn:int.wmo.core</inherit> <location>ecmwf.mars</location> <request> <class>e4</class> <levtype>sfc</levtype> <database>marser</database> </request> <variables> <date type='date'> <startDate>1980-01-01</startDate> <endDate>1990-12-31</endDate> </date> <param title='Parameter' multiple='1' type='enum'> <value>2t</value> <value>msl</value> </param> <time title='Base time' multiple='1' type='enum'> <value>0000</value> <value>0600</value> <value>1200</value> <value>1800</value> </time> </variables> </v:vgisc> <identificationInfo> <descriptiveKeywords>ECMWF 40 Years reanalysis ERA40 ERA-40 in GRIB</descriptiveKeywords> <topicCategory>NWP Outputs > ECMWF > 40 years reanalysis</topicCategory> <dataExtent> <temporalElement> <beginDateTime>1980-01-01</beginDateTime> <endDateTime>1990-12-31</endDateTime> </temporalElement> …

  30. Directory structure • Problem: create a browsable hierarchy of topics, as the “Google directory” (see NCAR’s community data portal) • Not to be confuse with the internal “fragment hierarchy” which is not exposed to the end user • Currently using the element <topicCategory> <topicCategory>NWP Outputs > ECMWF > 40 years reanalysis</topicCategory> • The same product can appear in several locations of the directory <topicCategory>Observations > By Type > Profile > Temp Land</topicCategory> <topicCategory>Observations > By Region > Asia > China</topicCategory> • Usage should be recommended by WMO

  31. Conclusion • The approach taken in the V-GISC should help us support the large variety of XML documents • Nevertheless, the standard is too flexible • Lot of programming is required to support all possible variations • The WMO must provide “best practices” guidelines • How to encode point in time, how to encode ranges, … • A topic hierarchy must be defined, to create the directory • WMO core metadata needs only contain sufficient information for discovery • The rest can be implemented as a series of local extensions, as long as they are not exported or exchanged

More Related