data driven methods in environmental sciences exploration of artificial intelligence techniques n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques PowerPoint Presentation
Download Presentation
Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques

Loading in 2 Seconds...

play fullscreen
1 / 34

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques - PowerPoint PPT Presentation

  • Uploaded on

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques. Data Driven Methods. What is Artificial Intelligence? Common AI techniques Choosing between AI techniques Pre and post processing. What is AI?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques' - emily

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data driven methods in environmental sciences exploration of artificial intelligence techniques

Data-driven methods in Environmental SciencesExploration of Artificial Intelligence Techniques

data driven methods
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

what is ai
What is AI?
  • Machines that perceive, understand and react to their environment
    • Goal of Babbage, etc.
    • Oldest endeavor in computer science
  • Machines that think
    • Robots: factory floors, home vacuums
    • Still quite impractical

ai vs humans
AI vs. humans
  • AI applications built on Aristotlean logic
    • Induction, semantic queries, system of logic
    • Human reasoning involves more than just induction
  • Computers never as good as humans
    • In reasoning and making sense of data
    • In obtaining a holistic view of a system
  • Computers much better than humans
    • In processing reams of data
    • Performing complex calculations

successful ai applications
Successful AI applications
  • Targeted tasks more amenable to automated methods
    • Build special-purpose AI systems
      • Determine appropriate dosage for a drug
      • Classify cells as benign or cancerous
    • Called “expert systems”
      • Methodology based on expert reasoning
      • Quick and objective ways to obtain answers

data driven methods1
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

fuzzy logic
Fuzzy logic
  • Fuzzy logic addresses key problem in expert systems
    • How to represent domain knowledge
    • Humans use imprecisely calibrated terms
    • How to build decision trees on imprecise thresholds

fuzzy logic example
Fuzzy logic example

Source: Matlab fuzzy logic toolbox tutorial

advantages of fuzzy logic
Advantages of fuzzy logic
  • Considerable skill for little investment
    • Fuzzy logic systems piggy bank on human analysis
      • Humans encode rules after intelligent analysis of lots of data
      • Verbal rules generated by humans are robust
    • Simple to create
      • Not much need for data or ground truth
      • Logic tends to be easy to program
  • Fuzzy rules are human understandable

where not to use fuzzy logic
Where not to use fuzzy logic
  • Do not use fuzzy logic if:
    • Humans do not understand the system
    • Different experts disagree
    • Knowledge can not be expressed with verbal rules
    • Gut instinct is involved
      • Not just objective analysis
  • A fuzzy logic system is limited
    • Piece-wise linear approximation to a system
    • Non-linear systems can not be approximated
      • Many environment applications are non-linear

neural networks
Neural Networks
  • Neural networks can approximate non-linear systems
    • Evidence-based
      • Weights chosen through optimization procedure on known dataset (“training”)
    • Works even if experts can’t verbalize their reasoning, or if there is ground truth

a example neural network
A example neural network

Diagram from:

advantages of neural networks
Advantages of neural networks
  • Can approximate any smooth function
    • The three-layer neural network
  • Can yield true probabilities
    • If output node is a sigmoid node
  • Not hard to train
    • Training process is well understood
  • Fast in operations
    • Training is slow, but once trained, the network can calculate the output for a set of inputs quite fast
  • Easy to implement
    • Just a sum of exponential functions

disadvantages of neural networks
Disadvantages of neural networks
  • A black box
    • The final set of weights yields no insights
    • Magnitude of weights doesn’t mean much
  • Measure of skill needs to be differentiable
    • RMS error, etc.
    • Can not use Probability of Detection, for example
  • Training set has to be complete
    • Unpredictable output on data unlike training
    • Need lots of data
    • Need expert willing to do lot of truthing

  • Fuzzy logic
    • Humans provide the rules
    • Not optimal
  • Neural network
    • Humans can not understand system
    • Optimal
  • Middle ground?
    • Genetic Algorithms
    • Decision Trees

genetic algorithms
Genetic algorithms
  • In genetic algorithms
    • One fixes the model (rule base, equations, class of functions, etc.)
    • Optimize the parameters to model on training data set
    • Use optimal set of parameters for unknown cases

an example genetic algorithm
An example genetic algorithm


advantages of genetic algorithms
Advantages of genetic algorithms
  • Near-optimal parameters for given model
    • Human-understandable rules
    • Best parameters for them
  • Cost function need not be differentiable
    • The process of training uses natural selection, not gradient descent
  • Requires less data than a neural network
    • Search space is more limited

disadvantages of genetic algorithms
Disadvantages of genetic algorithms
  • Highly dependent on class of functions
    • If poor model is chosen, poor results
      • Optimization may not help at all
  • Known model does not always lead to better understanding
    • Magnitude of weights, etc. may not be meaningful if inputs are correlated
    • Problem may have multiple parametric solutions

decision trees
Decision trees
  • Can automatically build decision trees from known data
    • Prune trees
    • Select thresholds
    • Choose operators
  • Disadvantages
    • Piece-wise linear, so typically less skilled than neural networks
    • Large decision trees are effectively a blackbox
    • Can not do regression, only classification
  • Advantages:
    • Fast to train
    • New advances: bagged, boosted decision trees approach skill of neural networks, but are no longer fast to train


30 50

T < 10C

20 15

T > 10C

10 35

Z > 45

18 2

Z < 45

2 13

V < 5

8 2

V > 5

2 33

radial basis functions
Radial Basis Functions

Diagram from: A. W. Jayawardena & D. Achela K. Fernando 1998: Use of Radial Basis Function Type Artificial Neural Networks for Runoff Simulation, Computer-Aided Civil and Infrastructure Engineering 13:2

  • Radial Basis Functions are a form of neural network
    • Localized gaussians
    • Linear sum of non-linear functions
  • Advantage: Can be solved by inverting a matrix, so very fast
  • Disadvantage: Not a general-enough model

data driven methods2
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

typical data driven application
Typical data-driven application

Input Data

Which features?

How do we find f()



AI application

in run-time


what is the role of the data
What is the role of the data?
  • Validation
    • Test known model
    • Technique:
      • Difference between model output and ground truth helps to validate the model
  • Calibration
    • Find parameters to model with desired structure
    • Technique:
      • Tuned fuzzy logic method
      • Genetic algorithms
  • Induction
    • Find model and parameters from just data
    • Technique:
      • Neural network methods, bagged/boosted decision trees, support vector machines, etc.

what is the problem to solve
What is the problem to solve?
  • Do you have a bunch of data and want to:
    • Estimate an unknown parameter from it?
      • True rainfall based on radar observations?
      • Amount of liquid content from in-situ measurements of temperature, pressure, etc?
      • Regression
    • Classify what the data correspond to?
      • A water surge?
      • A temperature inversion?
      • A boundary?
      • Classification
  • Regression and classification aren’t that different
    • Classification: estimate probability of an event
      • A function from 0-1

which ai technique
Which AI technique?
  • Do you have expert knowledge?
    • Humans have a “model” in their head? Should the final f() be understandable?
    • Create fuzzy logic rules from experts’ reasoning
      • Aggregate the individual fuzzy logic rules
      • Can tune the fuzzy rules based on data
        • Using regression, decision trees or neural networks for RMS error criterion
        • Genetic algorithms for error criteria like ROC, economic cost, etc.
      • Many times the original rules are just fine
  • Do you already know the model?
    • A power-law relationship? Gaussian? Quadratic? Rules?
    • Just need to find parameters to this model?
      • If linear, just use linear regression
      • If non-linear: use genetic algorithms
      • Use continuous GAs
  • Both of these can be used for regression (therefore, also classification)

which ai technique contd
Which AI technique (contd.)
  • Do you know nothing about the data?
    • Not the suspected equation/model (GA)?
    • Not the suspected rules (fuzzy logic)?
    • Use a AI technique that supplies its equations/rules
      • “black box”.
  • For classification, use:
    • Bagged decision trees or Support Vector Machines
      • If output is probabilistic, remember to apply Platt scaling
      • Summary statistics on bagged DTs can help answer “why”
    • Neural Networks
  • For regression, use:
    • Neural networks

where do your data come from
Where do your data come from?
  • Observed data
    • Compute features
    • Choose AI technique
      • The 4 choices in the previous two slides
  • Simulated data:
    • Example: trying to replicate a very complex model
    • Throw randomly-generated data at model
    • Compute features
    • Choose AI technique:
      • GA for parametric approximations
      • NN when you don’t know how to approximate

where do you get your inputs
Where do you get your inputs?
  • What type of data do you have?
    • Individual observations?
      • Sample them (choose at random) and use directly
    • Sparse observations in a time series?
      • Generate time-based features (1D moving windows)
      • Signal processing features from time series
    • Data from remotely sensed 2D grids?
      • Generate image-based features using convolution filters
      • Do you need:
        • Pixel-based regression/classification?
          • Use convolution features directly
        • Object-based regression/classification?
          • Identify regions using region growing
          • Use region-aggregate features

typical data driven application1
Typical data-driven application

Observed data

Signal/image processing;sampling


normalize/create chromosome/

determine confidences



Platt method/region-average/threshold

A data-driven application

in run-time


data driven methods3
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

  • Often can not use pixel data directly
    • Too much data, too highly correlated
    • May need to segment pixels into objects and use features computed on the objects
  • Different data sets may not be collocated
    • Need to interpolate to line them up
    • Mapping, objective analysis
  • Noise in data may need to be reduced
    • Smoothing
    • Present statistic of data, rather than data itself
  • Features need to be extracted from data
    • Human experts often good source of ideas on signatures to extract from data

  • The output of an expert system may be grid point by grid point
    • May need to provide output on objects
      • Storms, forests, etc.
    • Can average outputs over objects’ pixels
  • May need probabilistic output
    • Scale output of maximum marginal techniques
    • Use a sigmoid function
      • Called Platt scaling

  • What is Artificial Intelligence?
    • Data-driven methods to perform specific targeted tasks
  • Common AI techniques
    • Fuzzy logic, neural networks, genetic algorithms, decision trees
  • Choosing between AI techniques
    • Understand the role of your data
    • Do experts understand the system? (have a model)
    • Do experts expect to understand the system? (readability)
  • Pre and post processing
    • Image processing techniques on spatial grids