1 / 25

Distributed data access: THREDDS, OAI, CDP

Distributed data access: THREDDS, OAI, CDP. Presented By: Michael Burek. Acknowledgments: CDP staff: Dave Brown, Luca Cinquini, Don Middleton, Rob Markel, Scott Nixon, Nate Wilhelmi. Outline. Community Data Portal (CDP) THREDDS in the CDP introduction THREDDS in detail

jessie
Download Presentation

Distributed data access: THREDDS, OAI, CDP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed data access: THREDDS, OAI, CDP Presented By: Michael Burek Acknowledgments: CDP staff: Dave Brown, Luca Cinquini, Don Middleton, Rob Markel, Scott Nixon, Nate Wilhelmi

  2. Outline • Community Data Portal (CDP) • THREDDS in the CDP introduction • THREDDS in detail • THREDDS applied in the CDP, some details • OAI -- Open archives initiative • Demo • Thoughts about future developments

  3. Introduction to the CDP Community Data Portal (CDP) Project • UCAR wide, uniform, community resource for discovery (search and browse) across the organization • Search/browse: • Supports free or structured queries to find data • Boolean combinations • Keyword, controlled vocabularies • Creator, Publisher, Science Keyword (GCMD), Variable name (CF) • Data Format, Data Type, Data Delivery Service • Geographic, Time, Altitude • Data delivery Services • aggregation, subsetting, FTP, HTTP, Mass Store, LAS/FERRET, OPEnDAP

  4. Introduction to the CDP, cont. • The CDP serves diverse range of data providers: • Project based archives -- small, often limited resources • Multi institutional teams -- geographically separated • Multiple data types within a project: measurements, models, images • The CDP cooperates with NCAR existing data organizations • A few unusual datasets -- HAO division • Model software. Visualizations.

  5. CDP, Technologies • The CDP was begun in 2001 • Uses THREDDS* catalogs as to describe data content and structure • Uses Lucene as the search/discovery back end • Uses Open Archives Initiative OAI to share metadata • Uses SRM to access deep archive data, share data externally (ESG project) • Experimental use of SRB to share intra-institution • Sister site, Earth System Grid (ESG), uses grid technology to share data • Uses DODS/OPEnDAP for aggregation and subsetting data sets • Uses a distributed model for accessing data and metadata https://cdp.ucar.edu/ *Thematic Realtime Environmental Distributed Data Services

  6. Introduction, THREDDS in the CDP • THREDDS is a schema used for DATA DELIVERY • Can be also used for geoscience data search and discovery THREDDS catalogs: • Are ingested into Lucene and GEO extent searching tools for search and discovery • Are used to supply data for search results and browse pages • Specify data access mechanisms • http, http restricted, OPEnDAP, MSS, TDS, LAS, GDS, CDP/agg • Point to and use non-THREDDS metadata • ESG, DC, NcML, NcML, GML, DIF • Can interoperate with WMO metadata when available

  7. Introduction, THREDDS in the CDP, cont • The CDP federates directly with other sites that use THREDDS catalogs • NCAR DSS, NCAR EOL, UCAR UNIDATA • THREDDS catalogs are used inside DODS/OPEnDAP, GDS, and forthcoming Thredds Data Server • THREDDS will support a data access control system, locally and distributed

  8. THREDDS Background • THREDDS v0.6 • Support for describing the hierarchical structure of datasets • Support for describing data delivery services • Some very basic descriptive metadata • Support for extensible and distributed catalogs • Support for “inheritance” of metadata and services • Allows other descriptive schemas to be part of the catalog • Emphasizes the hierarchical relationships between data items, containing datasets and groups of datasets

  9. THREDDS V1.0 • THREDDS v1.0 • Added descriptive “minimal” metadata tuned for Earth Science search/discovery • “Minimal” defined -- Metadata sized for search/discovery • Again, Metadata can be inherited within the hierarchy • Design goal was to interoperate with core elements of DIF, ISO-19115, DC metadata • UNIDATA looking at incorporating THREDDS metadata in NetCDF* and forthcoming TDS** • Exploring possibly interoperating with BADC model extensions • V1.0x will have access control elements URL: http://my.unidata.ucar.edu/content/projects/THREDDS/index.htm *NetCDF UNIDATA defined binary data format for gridded and other geoscience data. Includes metadata that describes the data in the file header **TDS THREDDS data server -- will handle GRIB and NetCDF, will have WCS

  10. THREDDS -- CDP • CDP THREDDS design choices • Use THREDDS descriptive metadata for search/discovery • Use GCMD DIF controlled vocabularies for science keyword hierarchies, creator, publisher, project • Use Climate and Forecasting CF conventions for variable names when applicable • Mandate use of unique identifier to identify data • Use forthcoming THREDDS elements for data access control • Use OAI to import DIF records from BADC and GCMD, transform these records into equivalent THREDDS for use in the CDP • Import ESG (CCSM) records (THREDDS, ESG), extract a subset of descriptive metadata for search and discovery

  11. THREDDS, the details General Structure of a simple THREDDS catalog <catalog> <service name=“httpService” type=“HTTP” base=“http://dataportal.ucar.edu/data/abcData/”> <service name=“mssService” type=“MSS” base=“/mssRoot/abcData/”/ <dataset name=“abc” ID=“ucar.scd.cdp.datasetName”> <!-- container dataset --> <metdadata inherit=“true”> <!-- descriptive metadata --> <description type=“summary”> <creator> <geospatialCoverage> <!-- geographic location --> <….> <!-- other metadata (13 total) --> </metadata> <dataset ID=“ucar.scd.cdp.datasetName.item1”> <!-- describes a data item --> <dataSize units=“Kbytes”>123</datasize> <access serviceName=“httpService" urlPath=”subDataset/SOLVE_DC8_19991119.nc> <access serviceName=”mssService" urlPath=”subDataset/SOLVE_DC8_19991119.nc> </dataset> <more datasets> <!-- more dataset items --> </dataset> <! -- close enclosing dataset -> </catalog> Dataset URL = base + access points to local server or local service

  12. THREDDS, simple catalog catalog service service HTTP data service Local data access/ local MSS service MSS data service dataset (container) metadata description creator geospatialCoverage other elements dataset (data item) access, size, extent dataset access, size, extent dataset access, size, extent dataset access, size, extent

  13. THREDDS, distributed catalogs example dataset.thredds.xml 1. Descriptive Metadata is in a separate file, could be on anther server. 2. Dataset contains references to remote catalogs. 3. Catalog Level Access control elements catalog metadata description creator geospatialCoverage other elements dataset (container) metadata link catalogRef ACCESS CONTROL Remote Server catalog (remote) ACCESS CONTROL service metadata description … datasets catalogRef Remote data services catalog (remote) service metadata description … datasets

  14. THREDDS, database application example Virtual catalog service External HTTP data service Arbitrary Metadata Database External Server Database to THEDDS catalog builder (web service) metadata External Data hosting dataset (data item) access, size, extent dataset access, size, extent dataset access, size, extent dataset access, size, extent

  15. THREDDS, distributed data example catalog service service 1. Data is not on CDP, service is external, service can implement access control if required 2. Descriptive metadata is in a separate file, does not have to be THREDDS External HTTP data service MSS data service metadata description creator geospatialCoverage other elements External Server dataset (container) Metadata external reference Metadata external reference ISO-19115 iso-19115 elements External Data hosting dataset (data item) access, size, extent dataset access, size, extent dataset access, size, extent dataset access, size, extent

  16. CDP - distributed datasets, overview Community Data Portal Boston University SRB THREDDScatalog top SRB D NCAR Data Support Section D T T T LANL, ORNL, LBNL NCAR Atmospheric Chemistry. T LANL, ORNL, LBNL SRM LANL, ORNL, LBNL D A M T T NCAR EOL section SRM (ESG) T T Metadata database T T T D A A NCAR MSS T MASS Store M SRM T CDP data storage: WACCM, ACD, CME, CGD, …. A XSLT = Access control BADC OAI DIF T = THREDDS catalog DIFs OAI server DIF D DIF DIF D OAI client = Data Archive, M= MSS deep archive

  17. THREDDS review/summary • THREDDS is a schema used for DATA DELIVERY • Contains basic geoscience discovery data • Is designed to work with distributed data, distributed metadata • Contains elements for data access restriction • Can work with real time data • Can be a container for non-THREDDS descriptive metadata • Defines the hierarchical relationships of datasets • Defines data delivery services • Supports a hierarchical view of metadata • Integrated with many data delivery and visualization services

  18. Distributed Descriptive Metadata with OAI • Metadata is immediately “distributed” if metadata is contained in or is pointed to by THREDDS catalogs • Metadata can also be shared using OAI technology • OAI -- Open Archives Initiative from the Digital Library (DL) community • OAI is a web service definition for sharing metadata • OAI uses six verbs to define the service • OAI uses Dublin Core, DC, as the baseline schema • OAI can specify other XML schemas -- we use this capability • OAI can be used as a gateway to send information to an established DL community -- THREDDS -> DC => DL community via OAI • OAI disadvantage -- hierarchical relationships are lost

  19. Distributed Metadata with OAI -- CDP • THREDDS records are “flattened” (hierarchy collapsed) one record -> one dataset • Flattened records are shared using OAI • For a test, the THREDDS records were transformed into DIF using XSLT • DIF records were ingested from BADC transformed into THREDDS catalogs, and ingested into CDP search and browse

  20. CDP metadata architecture external metadata Web Interface/Web Service Metadata Conversion Catalog Parsing THREDDS catalog invokes write DIF metadata parse THREDDS records THREDDS records Metadata Processing THREDDS records read DC metadata index into Metadata repository XML viewer web application XML results Metadata DB (Lucene) passed to OAI client OAI server free-text Search Query UI Structured, Geospatial, Temporal Query UI import export THREDDS catalogs browser Web UI remote Data Center or Digital Library

  21. Data publication on the CDP Is ingested Metadata indexing application Lucene Index THREDDS descriptive metadata Creates Dataset Disk, HTTP, Database, … Catalog crawler application Creates THREDDS hierarchy metadata XSLT rendering Allows Access control Edits HTTP Metadata Authoring tool Creates BROWSER CDP Catalog Presentation Starts link

  22. Demo • Data searching: controlled vocabularies, GEO searching • Data browsing: access control • BADC shared metadata directory • Metadata editing • IDV Bundle showing integrated data source

  23. Experimental Topology to share data? GISC -> CDP CDP WMO GISC DB THREDDS CATALOG W WMO metadata T THREDDS metadata THREDDS CATALOG 1. OAI transfers of WMO records 2. CDP Crawls data hierarchy -- no metadata 3. GISC creates Web interface to produce virtual THREDDS Catalogs (embedded WMO descriptive metadata) NetCDF GRIB … XSLT D W W W T W W W W HTTP W W W T CDP Search W WMO DCPC W W W Crawler W W W W OAI W OAI W OAI W T XSLT D OAI

  24. Experimental Topology to share data CDP->GISC CDP WMO GISC THREDDS CATALOG NetCDF GRIB … W W W T WMO Search XSLT CDP Search W W W W OAI W W W W OAI W WMO metadata T THREDDS metadata

  25. Questions?

More Related