long term archiving of climate model data at wdc climate and dkrz
Download
Skip this Video
Download Presentation
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

Loading in 2 Seconds...

play fullscreen
1 / 28

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ. Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology , Hamburg Data Management Workshop (Köln, 29.-30.09.09). Structure 2009. DKRZ: Earth system model development

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Long-term Archiving of Climate Model Data at WDC Climate and DKRZ' - step


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
long term archiving of climate model data at wdc climate and dkrz

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

Michael Lautenschlager

WDC Climate / Max-Planck-Institute forMeteorology, Hamburg

Data Management Workshop (Köln, 29.-30.09.09)

slide2

Structure 2009

  • DKRZ:
  • Earth system model development
  • Simulations of past, present and future climate
  • WDC Climate:
  • Long-term data archiving
  • Inter-disciplinary data dissemination
slide6

Near surface temperature change for the scenarios

A1B und B1. Presented is the difference of the

30-year-means 2071-2100

minus 1961-1990.

slide7

Comparison of the present-day sea ice cover

In March and September

(oben) with the climate projection for the scenario A1B (unten) in 2100.

Additionally the snow over land can be obtained.

hlre ii architecture http www dkrz de dkrz about hardware
HLRE-II Architecture(http://www.dkrz.de/dkrz/about/hardware)

StorageTek Silos

Total Capacity: 60000 Tapes

Approx. 60 PB

(LTO and Titan)

xtape:

„get /hpss/arch/<prjid>/<myfile>“

(sftp xtape.dkrz.de)

ssh blizzard

HPSS

(10 Pbyte /a )

IBM Power6

2 x Login

250 x Compute

150 TFlopspeak

pftp

GPFS

(3 Pbyte)

tape:/hpss/arch

/hpss/doku

/dxul/ut

/dxul/utf

/dxul/utd

blizzard:

/work

/pf

/scratch

development of data archive at dkrz german climate computing centre
Development ofdataarchiveat DKRZ (German Climate Computing Centre)
  • Data production on IBM-P6: 50 PB/year
  • Limit formassstoragearchive (HPSS): 10 PB/year
    • Scientific projectdataarchivewithexpirationdate
  • Limit long-termdataarchive (WDCC): 1 PB/year
    • Requiredis a completedatacatalogueentry in WDCC (metadata)
    • Decisionprocedureforlong-termarchivetransitionis not finallyimplemented (datastoragepolicy).
    • Accessible via WDCC infrastructure
    • Searchabledatacatalogue (GUI)
    • Field-basedandfile-baseddataaccess (Internet)
    • Storage time period: at least 10 years (noexpirationdate)
development of mass storage archive
Development ofmassstoragearchive

Mid of 2009:

10 PB

Oct. 2008

slide11
Data documentation requirements are accomplished by using the WDCC infrastruture
    • CERA-2 metadata model developed in 1999
      • Catalogue interface: cera.wdc-climate.de
      • Input interface: input.wdc-climate.de
    • CERA-2 metadata content is complete with respect to browse, to discover and to use climate data which are stored in the database system or outside in flat files
    • The WDCC matches international description standards like ISO 19115, Dublin Core or GCMD and is integrated in international data federations
    • Data storage structure assembles field-based storage of climate time series per variable in database tables. This allows for web-based data catalogue search and data access in small data granules.
cera data model

Contact

Coverage

Reference

Entry

Status

Parameter

Spatial

Reference

Distribution

Local Adm.

Data Org

Data Access

CERA Data Model
slide13

Coloured columns correspond to BLOB data tables in WDCC.

Collections of matrix rows represents storage in model raw data files (complete model output storage time step by storage time step).

slide14

WDCC Developement

Future annualgrowth rate: 1 PB / year

slide17
WDCC / CERA: General Statistics at 30-09-2009 00:00:10
  • Database Size (TByte): 404
  • Number of blobs: 8194476663 (8.2 billion)
  • Number of experiments: 1378
  • Number of datasets: 165376
  • Total size divided by number of BLOBs gives the average size of data access granules:50 kB/BLOB (field-based data access)
wdcc content

IPCC

WOCE

GEBCO

BALTEX

HOAPS

CEOP

COSMOS

CARIBIC

Regional Climate

Scenarios IPCC-AR4

(CCLM + REMO)

EH5/MPI-OM

IPCC-AR4

ERA15/40

NCEP

Simulations @ MPI, GKSS,…

WDCC Content

Data from

Earth System

Modellingand

Related

Observations

ERA40

slide20

WDCC Catalogue searchanddataaccessinterface

(URL: cera.wdc-climate.de)

Access to 97 model experiments

slide21

WDCC Project-based Data Access

(IPCC AR4 Hamburg, ResultsfromIntroduction)

wdcc major accomplishments
WDCC major accomplishments
  • Offering many TB of data by a standard web-browser interface and a Java API for direct data download.
  • Entering the interdisciplinary e-science environment by the primary data publication service.
    • Independent data entities of more general interest are placed in library catalogues in order to make them searchable with and citable in classical scientific literature
    • WDCC has more than 50 data entities registered in TIBORDER which are connected to appr. 1.5 TB data volume.
  • Networking with other topic related WDCs and long-term data archives.
    • German WDC Cluster Earth System Research (WDC MARE, WDC RSAT and WDCC)
    • Data sharing with British Atmospheric Data Centre (BADC)
  • Offering data management services to scientific research projects for long-term archiving and dissemination of research results
slide23
Primary data publication service
  • Following the STD-DOI concept (Scientific and Technical Data – Digital Object Identifier, URL: www.std-doi.de)
  • Important aspects of the publication process are
    • The identification of independent data entities which are suitable for publication at the level of scientific literature,
    • The execution of an elaborated review process for metadata and climate data (quality control),
    • The assigment of additional metadata for electronic publication (ISO 690-2) and of persistent identifiers (DOI / URN) and
    • The integration of publication metadata and persistent identifiers into the TIB-Order library catalogue (German National Library of Science and Technology, Hannover) so that primary data entities are searchable and citable together with scientific literature.
    • Quality characteristic is presently “approved by author”, could be “peer reviewed” with ESSD (Earth System Science Data Journal).
    • Published data entities cannot be modified any longer.
    • They are freely available via Internet..
slide24

TIB

WDCC

slide25

Data infrastructure integrates data stewardship in the long-term archive

  • Bit-stream preservation
  • Quality assurance
  • Usability enabling
slide26
Long-term archive data stewardship
  • Bit-stream preservation
    • Secondary tape copies on different tapes and technology at separate location
    • Copy to new tapes after maximum number of tape accesses are reached (Refreshment)
  • Quality assurance
    • Semantic examinations: behavior of a numerical model compared to observations and to other models, part of the scientific evaluation process
    • Syntactic examinations: formal aspects of data archiving and ensurance that data archiving is free of errors as far as possible
      • Consitency between metadata and climate data
      • Completeness of climate data
      • Standard range of values
      • Spatial and temporal data arrangement
slide27
Long-term archive data stewardship (continued)
  • Usability enabling
    • Complete and searchable documenation of climate data entities (database tables and flat files) in the catalogue system of the WDCC
    • WDCC offers web-based data access to small data granules (individual entries in BLOB DB tables)
    • Archive technology transfer must be downward compatible to keep old data technically readable
    • Data processing tools and data format access libraries must be migrated to new architectures
slide28
Summary long-term archiving services at WDCC/DKRZ:
  • Long-term data storage at WDCC/DKRZ is thematically focused to Earth system research (modeling and related observations)
  • WDCC provides a fully documented data archive including a web-based searchable data catalogue and web-based data access
  • WDCC supports field-based data access including server side data processing (extraction of geographical regions and single time steps, format conversion)
  • WDCC is integrated in national (WDC-Cluster Germany, C3-Grid) and international data federations (IPCC AR5).
  • WDCC/DKRZ offer within the existing infrastructure long-term data storage for topic related external data entities at net cost basis.
ad