1 / 21

The Research Data Archive at NCAR

The Research Data Archive at NCAR. Doug Schuster and Steve Worley NCAR. Topic Outline. Introduction/History Core Data Categories/Featured Datasets Archive Management/Tools New Supporting IT Infrastructure Future Possibilities. Introduction/History. Data Support Section (Founded 1965)

shadi
Download Presentation

The Research Data Archive at NCAR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR

  2. Topic Outline • Introduction/History • Core Data Categories/Featured Datasets • Archive Management/Tools • New Supporting IT Infrastructure • Future Possibilities AMS 2011

  3. Introduction/History • Data Support Section (Founded 1965) • Paper -> Punch Cards -> Tapes -> CD/DVD’s ->Hard Drives -> Network Based Storage and Transfer • KB of observations -> Terabytes of Model Generated Data (Total archive volume over 600 TB) • Weeks or months for a user to get data -> Users want data access now (over 7000 registered users) • Pay for Data -> Free and open access to all datasets that aren’t subject to source restrictions AMS 2011

  4. Introduction/History • How do we evolve to support the growing needs of data users and generators? • Stay aware of current research uses • Strengthen datasets supporting core research data categories • Update archive management tools • Rebuild/Augment IT infrastructure • Educate supporting staff AMS 2011

  5. Core Data Categories • Content to support atmospheric and geosciences research • Some research examples: • Climate • Oceanographic • Hydrologic • Weather Prediction • Renewable Energy (Wind/Solar) AMS 2011

  6. Core Data Categories • Operational and Reanalysis model outputs Meteorological and Oceanographic Observations Remote Sensing Observations • Topography/Bathymetry, Vegetation, Land Use AMS 2011

  7. Featured Datasets 1662 Global Platform Observations 2011 AMS 2011

  8. Featured Datasets 1850 Analysis and Forecast Model Data 2011 AMS 2011

  9. Featured Datasets 1870 High Resolution Re-Analysis 2011 AMS 2011

  10. Archive Management How can we support an archive that continuously grows in volume and complexity with a fixed number of supporting staff? AMS 2011

  11. Archive Management • Common Data Management Tools • Functionality Requirements • Scalable • Integrated –one call does all • Automatable AMS 2011

  12. Archive Management • Common Data Management Tools • Task Completion Requirements • Data acquisition • Get Data (daily or irregularly) • Data Archival • Archive to disk and tape • Metadata Collection • Collect Metadata • Update Metadata Databases • Metadata Publishing • Update Web Server Pages • Update Internal Metadata Access Points AMS 2011

  13. Step 1: Get Data Integrated Archival Tools Model Generated Data GRIB, NetCDF Automated dsupdt RDA/CISL Servers Obs Data BUFR, ASCII etc. Remote Sensing Data Binary Manual Tape, FTP, etc Topography Vector Image, Binary, etc AMS 2011

  14. Step 2: Archive Data Integrated Archival Tools RDA/CISL Servers Model Generated Data GRIB, NetCDF RDA Database Model Generated Data Files GRIB-2 HPSS File attribute metadata: Name, Dataset, Location, Format Model Generated Data File Obs Data BUFR, ASCII etc. dsarch Remote Sensing Data Binary DISK Topography Vector Image, Binary, etc Model Generated Data File AMS 2011

  15. Step 3: Collect File Content Metadata/Check Integrity Integrated Archival Tools RDA/CISL Servers Model Generated File, GRIB-2 Format Temperature (Center, Date, Time, Level, Location) RDA DB Humidity (Center, Date, Time, Level, Location) File attribute metadata: Name, Dataset, Location, Format Gather Meta data File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) Vorticity (Center, Date, Time, Level, Location) Visibility (Center, Date, Time, Level, Location) Precip Rate (Center, Date, Time, Level, Location) AMS 2011

  16. Step 4: Publish Metadata and Data Integrated Archival Tools RDA/CISL Servers RDA Web Server RDA DB -Dynamic File lists -Data Search tools -Detailed Content Metadata -Data Subsetting Interfaces File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) CISL Computational Node -Detailed Metadata for files on disk. -Data Subsetting AMS 2011

  17. New Supporting IT/Infrastructure • Online Disk Upgrades • Larger Disk (450 TB) • Common Disk Interfaces (webserver and compute nodes) • Tape Archive Upgrades • High Performance Storage System (HPSS) • Computing Power Upgrades • Additional and more powerful servers AMS 2011

  18. New Supporting IT/Infrastructure NCAR User Community Pros: -Access to full RDA. -Fast computing. Complete User Community Pros: -Fast access to online data. -Access to all RDA metadata. -Access to RDA data. processing services. NCAR User Community Cons: -No access to online data. -Forced to use MSS as a file server: access is too slow -No direct access to RDA metadata. Complete User Community Cons: -Small fraction of RDA online. -Slow access to offline data. -Data processing requests take a long time to finish. AMS 2011

  19. New Supporting IT/Infrastructure Complete User Community Improvements: -Faster access to full RDA. -Expanded data processing services available. -Faster turnaround on data processing requests. NCAR User Community Improvements: -Faster access to full RDA. -Direct access to all RDA metadata. AMS 2011

  20. Future Possibilities • Leverage New IT Infrastructure • Server side parameter and spatial sub-setting across multiple datasets • Model or In-Situ observations • Data provided in multiple output formats • Web services based requests (REST, etc.) • Addition of large and diverse data sets to the RDA. AMS 2011

  21. http://dss.ucar.edu AMS 2011

More Related