1 / 32

WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences

World Data Center Climate: Terabyte Data Storage in a Relational Database System. Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany.

aneko
Download Presentation

WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. World Data Center Climate: Terabyte Data Storage in a Relational Database System Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November 1st + 2nd, 2005 WDCC Home: www.wdcc-climate.de / WDCC Contact: data@dkrz.de

  2. Content: Introduction of WDCC CERA2 Data Model Data Access Connection to Mass Storage Archive Summary

  3. IPCC WOCE GEBCO BALTEX HOAPS CEOP COSMOS CARIBIC EH5/MPI-OM IPCC-AR4 ERA15/40 NCEP Simulations @ MPI, GKSS,… WDCC Content Oktober 2005: 580 Experiments / 68.000 Data Sets Data from Earth System Modelling and Related Observations ERA40 Start: Approved in January 2003 Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing Centre (DKRZ)

  4. WDCC Access

  5. WDCC Size 4.6 Billion BLOBs

  6. WDCC DB Storage time levels parameters days/4 levels parameters days/4 Storage of global coverages per file or BLOB : all levels, all parameters arbitrary time intervals all levels, all parameters 1 moment (6 by 6 hours) 1 level, 1 parameter 1 moment (= 1 BLOB = 1 global field) how we get the grid data:Files from climate model postprocessing step 1: homogenizing time and calculation of diagnostics postprocessing step 2: isolation of levels & parameters and creation of BLOB table input

  7. Data Model

  8. (I) Data catalogue and Unix files (pointer or BLOB-table-entry) Enable search and identification of data Allow for data access as they are (coarse granularity) (II) Application-oriented data storage Time series of individual variables are stored as BLOB entries in DB Tables (fine granularity) Allow for fast and selective data access Storage in standard data format (GRIB, NetCDF) Allow for application of standard data processing routines (PINGOs, CDOs) CERA1) Concept: Semantic Data Management 1)Climate and Environmental data Retrieval and Archiving

  9. Experiment Description Unix-Files Table / Pointer Dataset 1 Description Dataset n Description BLOB Data Table BLOB Data Table WDCC Data Topology Level 1 - Interface: Metadata entries (XML, ASCII) + Data Files Level 2 – Interf.: Separate files containing BLOB table data in application adapted structure (time series of single variables) BLOB DB Table corresponds to scalable, virtual file at the operating system level.

  10. Contact Coverage Reference Entry Status Parameter Spatial Reference Distribution Local Adm. Data Org Data Access CERA Data Model

  11. CERA Modules • 3 Modules: • DATA_ACCESSfor automatted data access ( remote data access) • DATA_ORGorganization of grid data( geo-references of grid points in BLOBs) • CODEmatching of (internal) model code numbers

  12. The CERA2 data model … allows for data search according to discipline, keyword, variable, project, author, geographical region and time interval and for data retrieval. allows for specification of data processing (aggregation and selection) without attaching the primary data. is flexible with respect to local adaptations, to storage of different types of geo-referenced data, and to definition of data topologies (hierarchical, network, ….). is open for cooperation and interchange with other database systems (e.g. FGDC metadata standard and ISO 19115 included). But: is not the simplest data model for each single application. Data Model Functions

  13. Data Access

  14. Web Access to WDCC

  15. Interactive Catalogue Access lnternet Application Server web browser request: URL dynamic html pages http: html Servlet / JSP • Catalogue access via WWW • URL parsed by JSP • integrated DB retrieval by JSP • response in standard html • efficient administration of detailed meta information

  16. HTTP and JDBC Data Download lnternet Application Server Data download via WWW web browser request: html form write to client disk http: file download Servlet / JSP • request handeled by JSP • return of binary file • standard client side jdbc retrieval • return of binary file Data download via script/batch progr. „jblob“ request: jdbc jdbc file download write to client disk

  17. XML Interface for http Metadata Output lnternet Application Server user applications request: URL raw xml xhtml xsl – mapping ISO xml http: XML xsql –query DC xml ... various metadata formats see wini.wdc-climate.de • Metadata access via WWW: • xsql query to DB • xml output from DB • xsl mapping to any metadata format

  18. http Data Output lnternet Application Server request: URL user applications plain ASCII html tables binary objects http: plain, bin, html Java Servlet . . . various data formats • Data access via WWW • URL parsed by servlet • query: DB access by jdbc • response in any format

  19. Connection to Mass Storage Archive

  20. Tapes Disks Oracle DBMS + HSM DXDB: Unitree client on DB machines for communication between Oracle DB and tape archive

  21. DXDB is used for Ordinary Oracle datafiles Redo logs Backup Use of DXDB

  22. Migin Migout dxdb TBS - RW TBS - RW TBS - RO All tablespaces are moved “at once” to dxdb Tbl Partition 1 Tbl Partition 2 Tbl Partition 1

  23. Migout takes place after files haven’t been modified for x minutes Only one migout process per dxdb-filesystem Migin takes place immediately after a file is requested. Only parts accessed are retrieved from the backend storage. One migin process per requested file. Migout / Migin

  24. dxdb HWM LWM Purging

  25. It works It’s fast Applications don’t have to wait until files are completely restored from tapes. Pro

  26. It works Dxdb not supported by Oracle Oracle's officially supported Backend requirements do not necessarily match requirements from other applications like HSM systems (i.e. connection to Unitree is not standarised). Contra - If the backend works

  27. Summary • Efficient handling of detailed metadata • easy and structured administration of > 60 metadata tables • access support:Java Server Pages (JSP), Servlets, jdbc, xsqlincluding standard DB features (sql, views, triggers, ... ) • Efficient handling of fine granularity data • random access to arbitrary time steps of single parameters • access support:Java Server Pages (JSP), Servlets, jdbcincluding standard DB features (authorisation, ... ) • transparent migration of bulk data to tape

  28. The Winter TopTen Program identifies the world’s largest and most heavily used databases. Email reached in September, 13th: ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! ....... (1) Grand prizes are  awarded for first place winners in the All Environments categories only. WDCC's CERA DB has been identified as the largest Linux DB. http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf

More Related