1 / 27

SIMDAT : Elements for building the WIS

SIMDAT : Elements for building the WIS. TECO-WIS, Seoul, 6-8 November 2006 Matteo Dell’Acqua, Météo-France. Agenda. SIMDAT Introduction Architecture WIS functional requirements Metadata Results achieved Plans. Phase 1 : Connectivity. Phase 2 : Interoperability. Phase 3 : Knowledge.

theola
Download Presentation

SIMDAT : Elements for building the WIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMDAT : Elements for building the WIS TECO-WIS, Seoul, 6-8 November 2006 Matteo Dell’Acqua, Météo-France

  2. Agenda • SIMDAT Introduction • Architecture • WIS functional requirements • Metadata • Results achieved • Plans

  3. Phase 1: Connectivity Phase 2: Interoperability Phase 3: Knowledge .Virtual Data Repository .Introduction of Grid technologies research .Introduction of VO .Deployment of Infrastructure with particular attention to data transport and management .Distributed Data access .Integration of analysis services, workflows, discovery and data mining SIMDAT • Data Grids for Process and Product Development using Numerical Simulation and Knowledge Discovery • 4 years project funded by the EU • Contract with EU was signed on 1 September 2004 • SIMDAT focuses on 4 application areas: • product design in automotive and aerospace, • process design in pharmacology • service provision in meteorology • Budget of 11 M €

  4. SIMDAT Meteorology Partners • 22 members in the consortium • Deutscher Wetterdienst (DWD) • ECMWF • EUMETSAT • Météo France • UK Met Office • Intel • Ontoprise • IBM • IT Innovation • NEC

  5. Meteorology Application : Project Aims • To build an integrated and scalable framework for the collection and sharing of distributed data (WIS building blocks) allowing each member to act either as a GISC or as a DCPC with the ability to logically group several nodes into a “virtual” GISC (V-GISC) • Instead of each National Met Service having a GISC, a “virtual” GISC • 2 DCPCs : ECMWF, EUMETSAT • Service oriented framework targeting meteorology, hydrology, climate and environment and offering transparent access to distributed resources • Discovery service, cataloguing service, subscription service, … • Some key elements of the project are: • A single view of meteorological information which is distributed amongst the meteorological partners • Improve visibility and access to meteorological data through a comprehensive discovery service • Offer a variety of reliable services for collection and sharing of data and for routine dissemination • Provide a global access control policy managed by the partners and integrated into their existing security infrastructure

  6. Architectural Choices • Use Data Grid / SOA concepts that offer ways to • Build decentralized infrastructure • Access heterogeneous databases and datasets • Catalogue duplicated and synchronized at each site • To have a fast discovery (browse & search phase) and a reliable system (client redirection to another node) • Build an open and flexible framework integrating technologies from different areas • Allow to pick the best components of Grid Middleware (Globus,OGSA-DAI) • Associate J2EE and Grid/Web Services technologies to build solid components • QoS and Robustness are amongst the top priorities of the project • Use pipelining, priority and queuing mechanisms to process user’s requests

  7. Architecture • 3 main components to build the virtual database: Data Repository, Catalogue Node and Portal • Installed on each partner site and interconnected through a dedicated secure connection channel • Data Repository • Interface to the partners databases • Offers metadata information to describe, search, locate data • Offers interface to retrieve data from the associated local databases • Catalogue Node • Maintains the registry and ensures synchronisation • Harvests metadata and requests data from the data Repository • Ingests data and maintains the cache of the real-time data • Serves clients: Portal or other Nodes • Monitors the execution of the requests • Distributed Portal • Offers interface to search/browse the catalogue

  8. Architecture (cont.)

  9. WIS Functional Requirements • Support variety of data types (Common to all WMO Programmes) • Support Archive and Real-time datasets • Provide a Catalogue of all the meteorological data for exchange to support WMO programmes • Support ad-hoc requests for data and products: Pull model • Support routine dissemination of all observed data and products both real-time and non real-time : Push model • Support network security • Support of different users profile and data policies • Use different types of communication links (GTS, satellite, dedicated links)

  10. Support variety of data types • Interface to the existing Meteorological Databases • It provides access to any kind of databases (rdbms, bespoke, flat files) • Metadata provider • Provide Metadata information to discover, locate and describe data, in respect with a defined XML metadata format • Answer Catalogue Node metadata harvesting messages • Data provider • Provide an interface to asynchronously request data from the associated existing database • Transform the XML data request to the real database request • Offer a data channel (HTTP, FTP, …) to send the retrieved data to the Catalogue Node

  11. Support variety of data types Satellite data Model output Climate Time Series Community Portal Catalogue Oceanographic data (BATHY, SHIP) ERA40 data TIGGE data Model output Observation More than 27,000 datasets discoverable Climate Time Series Model output Real-time GTS data Model output Satellite data Model output Wave Observation Aviation data (TAF, METAR) Lightning data

  12. Catalogue of all available products • The Catalogue is built using the metadata harvested from the Data Repositories • The Catalogue is synchronized and replicated on each Catalogue Node • The Catalogue Node offers discovery services accessible to the user through the distributed portal • The Catalogue contains the necessary information to retrieve and sub select the data

  13. Metadata Support • Flexible architecture that can support any type of geo-referenced metadata • WMO Core 0.2 • WMO Core 1.0 • E2EDM • THREDDS • Dubin Core • For each type of metadata a configuration file describing how to extract the relevant information necessary for indexing and displaying the data in the portal

  14. WMO Core metadata standard - Challenges • WMO Core Profile, profile of ISO19115 on geo-referenced data • Scalability • Records are large and contain redundant information, slowing down the database hosting the catalogue • Same information repeated in all metadata records  Unnecessary information is circulating over the network • Some documents are orders of magnitude larger than data itself • Cannot represent very large archives with small granularity • Cannot fulfil all requirements to build the V-GISC • Information on how to retrieve data from local databases • Information to create a directory (Taxonomy of documents) • Information to sub-select data from a dataset

  15. WMO UKMO Synop Heathrow 2005-10-12 Core Owner Data type Location Date WMO Core metadata standard - Solutions • Split XML documents into fragments to solve the scalability issue • WMO core metadata is structured • Some parts are shared amongst many documents • Add specific extension to define all relevant information needed to implement the system and not defined by the WMO core • Internal unique ID • Hierarchy relationship • Physical location (which node holds the data) • Information used to generate a valid request to retrieve data from the end system • Information used to create web interface for the end user • Work with WMO ET to integrate extensions in future releases of standards

  16. Metadata Synchronization • New observation has been received by one site

  17. Metadata Synchronization (cont.) • The associated metadata are generated and published in the Data Repository

  18. Metadata Synchronization (cont.) • Catalogue Node harvests the new metadata and stores it in its Catalogue

  19. Metadata Synchronization (cont.) • The Catalogue of the other Nodes is synchronized and the dataset is searchable from any sites

  20. Support Archive and Real-time Data • A GTS Data Repository has been developed • Interfaced with the GTS (through a MSS) • It publishes GTS collections in the Cache • Currently,no data replication over the SIMDAT infrastructure • For phase III several sources plugged onto SIMDAT • Strategy to uniquely identify the datasets (using MD5 hash codes) • Real-time data replication using the metadata synchronization mechanism • Generic Solution that can be used by all the partners

  21. Support Pull model • A Portal is deployed on each site and offers a unique view of all the datasets available • Portal offers discovery mechanisms to the users • Full text, temporal and geographical search (google-like) • Directory browsing (yahoo-like browsing) • Portal provides request handling mechanisms to the users • Submitted requests can be asynchronous to manage long-lived requests • Users can manage requests (check status, delete them …) • Users retrieve the associated data when the request is complete • Portal uses the information contained in the metadata to create the data sub-selection forms • The metadata/data providers define how to access its datasets

  22. Support network security • Inter-Node Communication infrastructure based on web services over SSL • Metatada synchronisation • Data exchange • Inter-node requests • Real-time data replication • Monitoring and administration messages

  23. VO Domain A B C D1 F D2 E Support of different profile and data policies • VO Domain • Group of organisations that share a common data access policy (e.g. the RA-VI V-GISC) • Access to protected resources occurs on a domain basis • Authentication (AuthN) • Users register with a node • Users are known to all the nodes in the same domain • Any node within the domain should be able to authenticate a user of the domain • Authorisation (AuthZ) • AuthZ is performed at the node level to allow/deny access to the data • Data Access policy is expressed within the metadata • Implementation : first release March 2007

  24. Results Achieved • Nine Meteorological Services interconnected and exchanging data and metadata • Improved visibility and access to meteorological data through a comprehensive discovery service • Users able to search browse and retrieve data distributed within the partners • Unified Catalogue based on WMO Core Profile • First element of the security infrastructure • Flexible, non intrusive architecture • Support any kind of databases (RDBMS, XML, Flat File, Object, bespoke). • Zero development Data Repository • Support Asynchronous requests (Archive, long requests) • Interests shown by meteorological community • BoM (Australia), CMA (China), JMA (Japan) and KMA (Korea) fully integrated • DCPC cataloguing service • NCAR (US) and NODC (Russia) catalogues are harvested using OAI, users are redirected to their portal

  25. Plan • Implement Virtual Organisation • First release March 2007 • Support dissemination of data and products • Partners’ distribution system review : February 2007 • Subscription Service definition : April 2007 • Subscription Service implementation : March 2008 • Support different types of communication links • Dual RMDCN / Internet deployment study : June 2007 • RMDCN deployment : December 2007 • Finalize metadata editor : December 2006 • Enhance Discovery Service : September 2007

  26. SIMDAT Demonstration • Meshed network of GISCs and DCPCs running the SIMDAT software and including the 5 European partners, JMA, CMA, BoM, NCAR, NODC • NCAR and RNODC acting as DCPC and providing metadata via OAI to the V-GISC

  27. Acknowledgements • Special thanks to: • Baudouin Raoult (ECMWF) • Guillaume Aubert (ECMWF)

More Related