climate scientists big challenge reproducibility using big data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Climate scientists’ big challenge: reproducibility using big data PowerPoint Presentation
Download Presentation
Climate scientists’ big challenge: reproducibility using big data

Loading in 2 Seconds...

play fullscreen
1 / 8

Climate scientists’ big challenge: reproducibility using big data - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Climate scientists’ big challenge: reproducibility using big data. Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech. Reproducibility issues in climate science.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Climate scientists’ big challenge: reproducibility using big data' - nicole-finch


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
climate scientists big challenge reproducibility using big data

Climate scientists’ big challenge: reproducibility using big data

Kyo Lee, Chris Mattmann, and RCMES team

Jet Propulsion Laboratory (JPL), Caltech

reproducibility issues in climate science
Reproducibility issues in climate science
  • Lots of published papers and reports do not include a computational description which is sufficiently detailed to reproduce the results.
  • Even with detailed description, it is practically impossible to reproduce others’ climate simulation results.
  • How many readers of the IPCC report can draw this plot?

(from the latest IPCC report)

climate science is big data science
Climate Science is Big Data Science
    • Data sets are massive and stored in distributed systems over many physical locations.
    • Coupled Model Intercomparison Project Phase 5 (CMIP5) for IPCC assessment: 110 different experiments, 24 modeling centers, 45 models, 3.3 petabytes of data.
    • By 2020 each experiment will generate an exabyte of data.
  • Use massive observational data sets to:
    • Formulate hypotheses from observed empirical relationships.
    • Simulate current and past conditions under those hypotheses using climate models.
    • Test hypotheses by comparing simulations to observations.
our unique challenges data change quickly over time
Our unique challenges :data change quickly over time
  • Community Earth System Model (CESM) developed at National Center for Atmospheric Research
        • Options: discretization methods, sub-grid scale physics, coupling with ocean, and so on.
        • CESM is open source, but it is practically impossible to reproduce others’ simulation results.

CESM 1.0.3

(June 2011)

CESM 1.0.6

(May 2014)

CESM 1.0

(June 2010)

numerous ways to configure a simulation

minor updates and branch versions

regional climate model evaluation system rcmes http rcmes jpl nasa gov
Regional Climate Model Evaluation System (RCMES, http://rcmes.jpl.nasa.gov/)
  • RCMES is an open source software package developed by NASA’s JPL and UCLA to facilitate the evaluation of climate models. Now Open Climate Workbench (OCW) is one of top-level projects at the Apache Software Foundation.
  • Make observational datasets, with some emphasis on NASA satellite data, more accessible to the climate modelingcommunity for climate model evaluation.
  • Provide researchers more time to spend on analyzing results and less time coding and worrying about file formats, data transfers, etc.
  • Provide guidance to further improve models by visualizing collective evaluation results of models.
  • Make some basic model evaluation for climate models reproducible.
slide6

Regional Climate Model Evaluation System powered by Apache Software Foundation

Other Data Centers

(ESG, DAAC, ExArch Network)

User

input

Model data

URL

Extract OBS data

Extract model data

TRMM

Metadata

MODIS

Regridder

(Put the OBS & model data on the same time/space grid)

Data Table

Use the re-gridded data for user’s own analyses and VIS.

Data Table

Extractor for various data formats

AIRS

Data Table

  • PostgreSQL

Data extractor

(Binary or netCDF)

CERES

Data Table

Metrics Calculator

(Calculate evaluation metrics)

Data Table

Soil moisture

Data Table

Common Format,

Native grid,

Efficient architecture

Visualizer

(Plot the metrics)

ETC

Raw Data:

Various sources, formats,

Resolutions,

Coverage

RCMED

(Regional Climate Model Evaluation Database)

A large scalable database to store data from variety of sources in a common format

RCMET

(Regional Climate Model Evaluation Tool)

A library of codes for extracting data from RCMED and model and for calculating evaluation metrics

RCMESHigh-level technical architecture

RCMESHigh-level technical architecture

Ingest obs/models, re-gridding, calculate metrics (e.g., bias, RMSE, correlation, significance, PDFs), and visualize results (e.g., contour, time series, Taylor).

RCM data

RCM data

user

choice

user

choice

URL

URL

TRMM

TRMM

Extract OBS data

Extract OBS data

Extract RCM data

Extract RCM data

Metadata

Metadata

User’s own codes for ANAL and VIS.

User’s own codes for ANAL and VIS.

Data Table

Data Table

MODIS

MODIS

Regridder

Put the OBS & RCM data on the same grid for comparison

Regridder

Put the OBS & RCM data on the same grid for comparison

Data Table

Data Table

Extractor

Extractor

AIRS

AIRS

MySQL

MySQL

Data extractor

(Fortran binary)

Data extractor

(Fortran binary)

Data Table

Data Table

Metrics Calculator

Calculate comparison metrics

Metrics Calculator

Calculate comparison metrics

SWE

SWE

Data Table

Data Table

Data Table

Data Table

Data extractor

(Fortran binary)

Data extractor

(Fortran binary)

Soil moisture

Soil moisture

Data Table

Data Table

Visualizer

Plot the metrics

Visualizer

Plot the metrics

Common Format,

Native grid,

Efficient architecture

Common Format,

Native grid,

Efficient architecture

ETC

ETC

Raw Data:

Various Formats,

Resolutions,

Coverage

Raw Data:

Various Formats,

Resolutions,

Coverage

RCMET

(Regional Climate Model Evaluation Toolkit)

A library of codes for extracting data from RCMED and model and for calculating evaluation metrics

RCMET

(Regional Climate Model Evaluation Toolkit)

A library of codes for extracting data from RCMED and model and for calculating evaluation metrics

RCMED

(Regional Climate Model Evaluation Database)

A large scalable database to store data in a common format

RCMED

(Regional Climate Model Evaluation Database)

A large scalable database to store data in a common format

how to make climate studies more reproducible
How to make climate studies more reproducible?
  • Different programming languages (Fortran, Matlab, R, Python, IDL, NCL, GrADS, ….): the workflow system could facilitate replication of other studies.
  • Difficulties in reproducing others’ simulation results: Earth System Grid Federation (ESGF) provides software infrastructure to facilitate model intercomparison projects using observational data.
  • Climate scientists need more open source software similar to RCMES that can facilitate their analyses of observational and model data.