comparing predictive power in climate data clustering matters l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Comparing Predictive Power in Climate Data: Clustering Matters PowerPoint Presentation
Download Presentation
Comparing Predictive Power in Climate Data: Clustering Matters

Loading in 2 Seconds...

play fullscreen
1 / 31

Comparing Predictive Power in Climate Data: Clustering Matters - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Comparing Predictive Power in Climate Data: Clustering Matters. Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases Minneapolis, MN August 24, 2011. Outline. Motivation Networks Primer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Comparing Predictive Power in Climate Data: Clustering Matters' - walker


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
comparing predictive power in climate data clustering matters

Comparing Predictive Power in Climate Data: Clustering Matters

Karsten SteinhaeuserUniversity of Minnesotajoint work withNitesh Chawla & Auroop Ganguly

12th International Symposium onSpatial and Temporal DatabasesMinneapolis, MNAugust 24, 2011

outline
Outline
  • Motivation
  • Networks Primer
  • From Data to Networks
  • Motivating Networks in Climate Science
  • Descriptive Analysis and Predictive Modeling
  • Empirical Evaluation & Comparison
  • Conclusions

University of Minnesota

mining complex data
Mining Complex Data
  • Complex spatio-temporal data pose unique challenges
  • Tobler’s First Law of Geography:“Everything is related, but nearthings more than distant.”
    • But are all near things equally related?
    • Are there phenomena explained byinteractions among distant things?(teleconnections)

University of Minnesota

networks primer
Networks Primer

What is a Network?

  • Oxford English Dictionary:network, n.: Any netlike or complexsystem or collection of interrelatedthings, as topographical features,lines of transportation, ortelecommunications routes(esp. telephone lines).
  • My working definition:Any set of items that are connected or related to each other.(“items” and “connections” can be concrete or abstract)

University of Minnesota

networks primer5
Networks Primer

Community Detection in Networks

  • Identify groups of nodesthat are relatively more tightlyconnected to each other thanto other nodes in the network
  • Computationally challengingproblem for real-world networks

University of Minnesota

from data to networks
From Data to Networks
  • Networks are pervasive insocial science, technology,and nature
  • Many datasets explicitlydefine network structure
  • But networks can also represent other types of data, framework for identifying relationships, patterns, etc.

University of Minnesota

motivating networks in climate
Motivating Networks in Climate

Uncertainty derives from many known andoften many more unknown sources

University of Minnesota

motivating networks in climate8
Motivating Networks in Climate
  • Projections of climaterely on many factors
    • Understanding of thephysical processes
    • Ability to implementthis understanding incomputational models
    • Assumptions aboutthe future

Source: IPCC SRES and AR-4

University of Minnesota

motivating networks in climate9
Motivating Networks in Climate
  • Some processes well-understood and modeled, others much less credible
  • Comparison to observations shows varying skills

University of Minnesota

motivating networks in climate10
Motivating Networks in Climate
  • Models cannot capture some features/processes
  • Comparison to observations illustrates severe geographic variability, topographic bias

University of Minnesota

motivating networks in climate11
Motivating Networks in Climate

Research Question:

Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding?

Answer:Stay Tuned…

University of Minnesota

knowledge discovery for climate

High-Performance Computing

Data Storage & Management

Knowledge Discovery

System Inputs

HPCCLens - Jaguar

Observations

Complex Networks

Data Mining

HPSS

Oracle DB

GCM outputs

Visualization

ARM Data

Basic OLAPCapabilities

ArcGIS

PowerWall

System Outputs

Novel Insights

Decision Support

Knowledge Discovery for Climate

University of Minnesota

historical climate data
Historical Climate Data
  • NCEP/NCAR Reanalysis (proxy for observation)
  • Monthly for 60 years (1948-2007) on 5ºx5º grid
  • Seven variables:Sea surface temperature (SST)Sea level pressure (SLP)Geopotential height (GH)Precipitable water (PW)Relative Humidity (RH)Horizontal wind speed (HWS)Vertical wind speed (VWS)

Raw Data

De-Seasonalize

AnomalySeries

University of Minnesota

network construction
Network Construction
  • View global climate system as a collection of interacting oscillators[Tsonis & Roebber, 2004]
    • Vertices represent locations in space
    • Edges denote correlation in variability
  • Link strength estimated by correlation, low-weight edges are pruned from the network
  • Construct networks only for locations over the oceans
    • Relatively better captured by models

University of Minnesota

geographic properties

Autocorrelation

Teleconnection

Geographic Properties
  • Examine network structure in spatial context
    • Link lengths computed as great-circle distance
    • Compare autocorrelation / de-correlation lengths for different variables, interpret within the domain

Sea Level Pressure Precipitable Water Vertical Wind Speed

University of Minnesota

clustering climate networks
Clustering Climate Networks
  • Apply community detection to partition networks
    • Use Walktrap algorithm[Pons & Latapy, 2006]
    • Efficient and works well for dense networks
  • Visualize spatial pattern using GIS tools
  • Cluster structure suggests relationships within the climate system

Sea Level Pressure

Precipitable Water

University of Minnesota

update
Update!

Research Question:

Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding?

Revised Answer:Yes… but that’s not all.

University of Minnesota

descriptive predictive
Descriptive  Predictive
  • Network representation is able to capture interactions, reveal patterns in climate
    • Validate existing assumptions / knowledge
    • Suggest potentially new insights or hypothesesfor climate science
  • Want to extract the relationships between atmospheric dynamics over ocean and land
    • i.e., “Learn” physical phenomena from the data

University of Minnesota

predictive modeling
Predictive Modeling
  • Use network clusters as candidate predictors
  • Create response variables for target regions around the globe (illustrated below)
  • Build regressionmodel relatingocean clustersto land climate

University of Minnesota

illustrative example
Illustrative Example
  • Predictive model for air temperature in Peru
    • Long-term variability highly predictable due towell-documented relation to El Nino
  • Small number of clusters have majority of skill
    • Feature selection (blue line) improves predictions

Raw DataAll ClustersFeature Selection

University of Minnesota

update21
Update!

Research Question:

Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding?

Revised Answer:Yes and Yes… but wait, there’s more.

University of Minnesota

results on train test
Results on Train/Test

University of Minnesota

predictive skill
Predictive Skill

University of Minnesota

update24
Update!

Research Question:

Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding?

Revised Answer:Yes, Yes, and Yes.

University of Minnesota

variations extensions
Variations / Extensions
  • Compare network approach to traditional clustering methods
    • k-means, k-medoids, spectral, EM, etc.
  • Compare different types of predictive models
    • (linear) regression, regression trees, neural nets, support vector regression

University of Minnesota

compare clustering methods
Compare Clustering Methods

University of Minnesota

compare predictive models
Compare Predictive Models

University of Minnesota

refining model projections
Refining Model Projections

University of Minnesota

conclusions
Conclusions
  • Networks capture behavior of the climate system
  • Clusters (or “communities”) derived from these networks have useful predictive skill
    • Statistically significantly better than predictors based on clusters derived using traditional methods
  • Potential for advancing climate science
    • Understanding of physical processes
    • Complement climate model simulations

University of Minnesota

upcoming events
Upcoming Events
  • First International Workshop on Climate Informatics, New York, NY, August 26, 2011http://www.nyas.org/climateinformatics
  • NASA Conference on Intelligent Data Understanding (CIDU), Mountain View, CA, Oct 19-21, 2011https://c3.ndc.nasa.gov/dashlink/projects/43/
  • IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Vancouver, Canada, December 10, 2011http://www.nd.edu/~dial/climkd11/

University of Minnesota

thanks questions
Thanks & Questions

Contact

ksteinha@umn.edu

Personal Homepage

http://www.nd.edu/~ksteinha

NSF Expeditions onUnderstanding Climate Change

http://climatechange.cs.umn.edu

This work was supported in part by the National Science Foundation under Grants OCI-1029584 and BCS-0826958. This research was also funded in part by the project entitled “Uncertainty Assessment and Reduction for Climate Extremes and Climate Change Impacts” under the initiative “Understanding Climate Change Impact: Energy, Carbon, and Water Initiative” within the Laboratory Directed Research and Development (LDRD) Program of the Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725.

University of Minnesota