1 / 27

CERA / WDCC

CERA / WDCC. Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008. Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary. Contents.

nodin
Download Presentation

CERA / WDCC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

  2. Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary Contents

  3. WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370 Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files. Number of experiments: 1146 Number of datasets: 142062 Total size divided by number of BLOBs gives the average size of data access granules:50 - 60 kB/BLOB Basic Statistics

  4. Users by continent Active Users 1-Jan-2008 until 14-Oct-2008

  5. Download destinations Download destinations 1-Jan-2008 until 14-Oct-2008

  6. Records per download

  7. Recordsize

  8. Access over WAN Downloads typically quite small, but huge downloads to some extent. Small downloads imply that users are not willing to wait long … We can not scan through large files for each download Granularity has to be small Requirements and constraints

  9. Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …) Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products Datatypes

  10. CERA provides the ability to store data of any format: These are the formats used GRIB (60%) NetCDF (18%) Other (22%) Formats

  11. General Architecture Midtier Data

  12. Contact Coverage Reference Entry Webserver Proxy Status Parameter Appl. Server Metadata Data Spatial Reference Distribution Local Adm. Data Org Data Access General Architecture Select timestep + region Convert format

  13. Database Table 1 Data of timestep i 2 Data of timestep i+1 Data of single variable 3 Data of timestep i+2 … n Data of timestep i+n Storage within CERA Index

  14. Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB Database has been coupled transparently to the HSM system How do we avoid frequent tape accesses? Big cache  Store data as close as possible according to the needs of users: split into single variables Handicap

  15. Migin Migout dxdb TBS - RW TBS - RW TBS - RO All tablespaces are moved “at once” to dxdb Tbl Partition 1 Tbl Partition 2 Tbl Partition 1 Data migration

  16. Header 128k Table Lob Index Primary Key Blob data Inside the datafile

  17. Header 128k Header 128k Frontend versus Backend Filesystem Frontend HSM Backend Part 1 = 512 MB Part 2 = 512 MB

  18. Header 128k 3 1 2 5 4 Retrieving data Tape Request

  19. Compression – nothing special used within the server Partitioning – allow parts of data to be moved to HSM Backup Nologging - beware of crash … Read only - two copies on tape Warehouse features

  20. Metadata database will stay as is Oracle Databases holding data will be replaced by a new, self-made development Why? There is a certain risk that a future version of Oracle may not work with a / any HSM system On the long run some license costs shall be saved New implementation

  21. Webserver Appl. Server Metadata Data General Architecture - new Oracle-DB Blobserver

  22. Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files. Ability to keep huge number of records. They provide fast access independent of position within file (granular access). Provided fault tolerance against tape damages by keeping checksums within the files. Enclose read/write operations against container files in transactions. Well known format CERA-Container

  23. Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished Software First software ready, in order to migrate data Convert old data Started last week, but will take at least a year Migration

  24. 1 8 Webserver Appl. Server 2 7 4 3 5 6 Dataflow: outbound Processing Metadata Data

  25. Metadata Dataserver Dataflow: inbound Model run GFS Postprocessing

  26. CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and external data Users are typically fetching only small amounts of data. System allows for efficient access to small data granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future - CERA Container files. Summary

  27. Thank you !

More Related