Long term archiving of climate model data at wdc climate and dkrz
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ. Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology , Hamburg Data Management Workshop (Köln, 29.-30.09.09). Structure 2009. DKRZ: Earth system model development

Download Presentation

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Long term archiving of climate model data at wdc climate and dkrz

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

Michael Lautenschlager

WDC Climate / Max-Planck-Institute forMeteorology, Hamburg

Data Management Workshop (Köln, 29.-30.09.09)


Long term archiving of climate model data at wdc climate and dkrz

Structure 2009

  • DKRZ:

  • Earth system model development

  • Simulations of past, present and future climate

  • WDC Climate:

  • Long-term data archiving

  • Inter-disciplinary data dissemination


Long term archiving of climate model data at wdc climate and dkrz

Diagram of Climate System


Long term archiving of climate model data at wdc climate and dkrz

Diagram of the Hamburg IPCC-Climate Model ECHAM5/MPI-OM


Long term archiving of climate model data at wdc climate and dkrz

Forcing of Climate Projetions for IPCC AR4


Long term archiving of climate model data at wdc climate and dkrz

Near surface temperature change for the scenarios

A1B und B1. Presented is the difference of the

30-year-means 2071-2100

minus 1961-1990.


Long term archiving of climate model data at wdc climate and dkrz

Comparison of the present-day sea ice cover

In March and September

(oben) with the climate projection for the scenario A1B (unten) in 2100.

Additionally the snow over land can be obtained.


Hlre ii architecture http www dkrz de dkrz about hardware

HLRE-II Architecture(http://www.dkrz.de/dkrz/about/hardware)

StorageTek Silos

Total Capacity: 60000 Tapes

Approx. 60 PB

(LTO and Titan)

xtape:

„get /hpss/arch/<prjid>/<myfile>“

(sftp xtape.dkrz.de)

ssh blizzard

HPSS

(10 Pbyte /a )

IBM Power6

2 x Login

250 x Compute

150 TFlopspeak

pftp

GPFS

(3 Pbyte)

tape:/hpss/arch

/hpss/doku

/dxul/ut

/dxul/utf

/dxul/utd

blizzard:

/work

/pf

/scratch


Development of data archive at dkrz german climate computing centre

Development ofdataarchiveat DKRZ (German Climate Computing Centre)

  • Data production on IBM-P6: 50 PB/year

  • Limit formassstoragearchive (HPSS): 10 PB/year

    • Scientific projectdataarchivewithexpirationdate

  • Limit long-termdataarchive (WDCC): 1 PB/year

    • Requiredis a completedatacatalogueentry in WDCC (metadata)

    • Decisionprocedureforlong-termarchivetransitionis not finallyimplemented (datastoragepolicy).

    • Accessible via WDCC infrastructure

    • Searchabledatacatalogue (GUI)

    • Field-basedandfile-baseddataaccess (Internet)

    • Storage time period: at least 10 years (noexpirationdate)


Development of mass storage archive

Development ofmassstoragearchive

Mid of 2009:

10 PB

Oct. 2008


Long term archiving of climate model data at wdc climate and dkrz

  • Data documentation requirements are accomplished by using the WDCC infrastruture

    • CERA-2 metadata model developed in 1999

      • Catalogue interface: cera.wdc-climate.de

      • Input interface: input.wdc-climate.de

    • CERA-2 metadata content is complete with respect to browse, to discover and to use climate data which are stored in the database system or outside in flat files

    • The WDCC matches international description standards like ISO 19115, Dublin Core or GCMD and is integrated in international data federations

    • Data storage structure assembles field-based storage of climate time series per variable in database tables. This allows for web-based data catalogue search and data access in small data granules.


Cera data model

Contact

Coverage

Reference

Entry

Status

Parameter

Spatial

Reference

Distribution

Local Adm.

Data Org

Data Access

CERA Data Model


Long term archiving of climate model data at wdc climate and dkrz

Coloured columns correspond to BLOB data tables in WDCC.

Collections of matrix rows represents storage in model raw data files (complete model output storage time step by storage time step).


Long term archiving of climate model data at wdc climate and dkrz

WDCC Developement

Future annualgrowth rate: 1 PB / year


Long term archiving of climate model data at wdc climate and dkrz

WDCC Users (authorisedfordatadownload)

2008


Long term archiving of climate model data at wdc climate and dkrz

WDCC Data Downloads in 2008


Long term archiving of climate model data at wdc climate and dkrz

WDCC / CERA: General Statistics at 30-09-2009 00:00:10

  • Database Size (TByte): 404

  • Number of blobs: 8194476663 (8.2 billion)

  • Number of experiments: 1378

  • Number of datasets: 165376

  • Total size divided by number of BLOBs gives the average size of data access granules:50 kB/BLOB (field-based data access)


Wdcc content

IPCC

WOCE

GEBCO

BALTEX

HOAPS

CEOP

COSMOS

CARIBIC

Regional Climate

Scenarios IPCC-AR4

(CCLM + REMO)

EH5/MPI-OM

IPCC-AR4

ERA15/40

NCEP

Simulations @ MPI, GKSS,…

WDCC Content

Data from

Earth System

Modellingand

Related

Observations

ERA40


Long term archiving of climate model data at wdc climate and dkrz

Oracle BLOB-DB: dataaccess via http and Java-API


Long term archiving of climate model data at wdc climate and dkrz

WDCC Catalogue searchanddataaccessinterface

(URL: cera.wdc-climate.de)

Access to 97 model experiments


Long term archiving of climate model data at wdc climate and dkrz

WDCC Project-based Data Access

(IPCC AR4 Hamburg, ResultsfromIntroduction)


Wdcc major accomplishments

WDCC major accomplishments

  • Offering many TB of data by a standard web-browser interface and a Java API for direct data download.

  • Entering the interdisciplinary e-science environment by the primary data publication service.

    • Independent data entities of more general interest are placed in library catalogues in order to make them searchable with and citable in classical scientific literature

    • WDCC has more than 50 data entities registered in TIBORDER which are connected to appr. 1.5 TB data volume.

  • Networking with other topic related WDCs and long-term data archives.

    • German WDC Cluster Earth System Research (WDC MARE, WDC RSAT and WDCC)

    • Data sharing with British Atmospheric Data Centre (BADC)

  • Offering data management services to scientific research projects for long-term archiving and dissemination of research results


Long term archiving of climate model data at wdc climate and dkrz

Primary data publication service

  • Following the STD-DOI concept (Scientific and Technical Data – Digital Object Identifier, URL: www.std-doi.de)

  • Important aspects of the publication process are

    • The identification of independent data entities which are suitable for publication at the level of scientific literature,

    • The execution of an elaborated review process for metadata and climate data (quality control),

    • The assigment of additional metadata for electronic publication (ISO 690-2) and of persistent identifiers (DOI / URN) and

    • The integration of publication metadata and persistent identifiers into the TIB-Order library catalogue (German National Library of Science and Technology, Hannover) so that primary data entities are searchable and citable together with scientific literature.

    • Quality characteristic is presently “approved by author”, could be “peer reviewed” with ESSD (Earth System Science Data Journal).

    • Published data entities cannot be modified any longer.

    • They are freely available via Internet..


Long term archiving of climate model data at wdc climate and dkrz

TIB

WDCC


Long term archiving of climate model data at wdc climate and dkrz

  • Data infrastructure integrates data stewardship in the long-term archive

  • Bit-stream preservation

  • Quality assurance

  • Usability enabling


Long term archiving of climate model data at wdc climate and dkrz

Long-term archive data stewardship

  • Bit-stream preservation

    • Secondary tape copies on different tapes and technology at separate location

    • Copy to new tapes after maximum number of tape accesses are reached (Refreshment)

  • Quality assurance

    • Semantic examinations: behavior of a numerical model compared to observations and to other models, part of the scientific evaluation process

    • Syntactic examinations: formal aspects of data archiving and ensurance that data archiving is free of errors as far as possible

      • Consitency between metadata and climate data

      • Completeness of climate data

      • Standard range of values

      • Spatial and temporal data arrangement


Long term archiving of climate model data at wdc climate and dkrz

Long-term archive data stewardship (continued)

  • Usability enabling

    • Complete and searchable documenation of climate data entities (database tables and flat files) in the catalogue system of the WDCC

    • WDCC offers web-based data access to small data granules (individual entries in BLOB DB tables)

    • Archive technology transfer must be downward compatible to keep old data technically readable

    • Data processing tools and data format access libraries must be migrated to new architectures


Long term archiving of climate model data at wdc climate and dkrz

Summary long-term archiving services at WDCC/DKRZ:

  • Long-term data storage at WDCC/DKRZ is thematically focused to Earth system research (modeling and related observations)

  • WDCC provides a fully documented data archive including a web-based searchable data catalogue and web-based data access

  • WDCC supports field-based data access including server side data processing (extraction of geographical regions and single time steps, format conversion)

  • WDCC is integrated in national (WDC-Cluster Germany, C3-Grid) and international data federations (IPCC AR5).

  • WDCC/DKRZ offer within the existing infrastructure long-term data storage for topic related external data entities at net cost basis.


  • Login