60 likes | 129 Views
Learn about the comprehensive process, modifications, and archiving involved in managing CMIP3 data. Understand the user perspective for accessing the vast datasets.
E N D
Software developed for ESG was modified for CMIP3 (IPCC AR4) • Prerelease ESG version 1.0 • Modified data search • Advance search • Pydap server • ftp server • OPeNDAP client connection • LAS-CDAT interface • CDAT and VCDAT interface
LLNL/PCMDI – Modeling centers responsibility • Provisioning of CMOR code sent to modeling centers • Most used CMOR those that didn’t created CF-1.0 output • 12 experiments, 23 models, 13 countries, 4 continents • 35 TB
LLNL/PCMDI – Data archiving and publishing • Liaison for exchanging data via shippable 1TB disk arrays • Upload of submitted datasets to PCMDI’s online storage facility – Green Data Oasis (GDO) (Data put into scratch space for QC checking) • Allow data QC by PCMDI staff • If data is not correct, contact modeling centers to resend the data • In some cases PCMDI modifies the data for modeling centers • Raw data is archived at NERSC HPSS • In the case of modified data, the instructions and script(s) to modify the data are also archived • QC data is moved and published by ESG staff • The complete published CMIP3 data archive is backed up at NERSC • Community is notified of the new data • When data is removed or updated the community is also notified. • Install all necessary software used by the QC PCMDI staff to verify the data is written in Python (i.e., CDAT)
Process in a nutshell: • Modeling centers send sample data to PCMDI after data are processed with CMOR or equivalent • Data undergoes an initial quality-control check through CMOR Checker. • If data is correct, modeling centers start relaying their datasets to PCMDI via shippable 1 TB storage disks. Most groups sent from 3 to 10 TB. • PCMDI uploads datasets from shippable disk arrays to local storage online storage facility for QC checking. • PCMDI makes a copy of all raw incoming datasets on the NERSC HPSS for purposes of disaster recovery. (Which we’ve seen more than once.) • Datasets will are QC'ed by PCMDI staff using CDAT tools. • QC'ed datasets are published to the Earth System Grid (ESG), making them transparently available through the CMIP3 data portal, ftp, and Pydap servers. • QC’ed published data sets are archived at NERSC HPSS.
LLNL/PCMDI – User’s Perspective • Before accessing the data the user must first make a request for an account • After the request has been made the user waits for approval from the CMIP3 administrator. • A user name and password is issued • Once the username and password has been issued, the user use them to gain access via ftp, OPeNDAP client (CDAT and VCDAT), and LAS-CDAT • Most data (80% to 90%) has been downloaded via ftp • Download rates are about 300 GB per day with max peaks over 900 GB per day. • Over 1,400 registered users • Over 300 TB downloaded (over 1 million files)