data driven methods in environmental sciences exploration of artificial intelligence techniques n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques PowerPoint Presentation
Download Presentation
Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques

Loading in 2 Seconds...

play fullscreen
1 / 34

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques - PowerPoint PPT Presentation


  • 253 Views
  • Uploaded on

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques. Valliappa.Lakshmanan@noaa.gov. Data Driven Methods. What is Artificial Intelligence? Common AI techniques Choosing between AI techniques Pre and post processing. What is AI?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques' - emily


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data driven methods in environmental sciences exploration of artificial intelligence techniques

Data-driven methods in Environmental SciencesExploration of Artificial Intelligence Techniques

Valliappa.Lakshmanan@noaa.gov

lakshman@ou.edu

data driven methods
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

what is ai
What is AI?
  • Machines that perceive, understand and react to their environment
    • Goal of Babbage, etc.
    • Oldest endeavor in computer science
  • Machines that think
    • Robots: factory floors, home vacuums
    • Still quite impractical

lakshman@ou.edu

ai vs humans
AI vs. humans
  • AI applications built on Aristotlean logic
    • Induction, semantic queries, system of logic
    • Human reasoning involves more than just induction
  • Computers never as good as humans
    • In reasoning and making sense of data
    • In obtaining a holistic view of a system
  • Computers much better than humans
    • In processing reams of data
    • Performing complex calculations

lakshman@ou.edu

successful ai applications
Successful AI applications
  • Targeted tasks more amenable to automated methods
    • Build special-purpose AI systems
      • Determine appropriate dosage for a drug
      • Classify cells as benign or cancerous
    • Called “expert systems”
      • Methodology based on expert reasoning
      • Quick and objective ways to obtain answers

lakshman@ou.edu

data driven methods1
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

fuzzy logic
Fuzzy logic
  • Fuzzy logic addresses key problem in expert systems
    • How to represent domain knowledge
    • Humans use imprecisely calibrated terms
    • How to build decision trees on imprecise thresholds

lakshman@ou.edu

fuzzy logic example
Fuzzy logic example

Source: Matlab fuzzy logic toolbox tutorial

http://www.mathworks.com/access/helpdesk/help/toolbox/fuzzy/fp350.html

lakshman@ou.edu

advantages of fuzzy logic
Advantages of fuzzy logic
  • Considerable skill for little investment
    • Fuzzy logic systems piggy bank on human analysis
      • Humans encode rules after intelligent analysis of lots of data
      • Verbal rules generated by humans are robust
    • Simple to create
      • Not much need for data or ground truth
      • Logic tends to be easy to program
  • Fuzzy rules are human understandable

lakshman@ou.edu

where not to use fuzzy logic
Where not to use fuzzy logic
  • Do not use fuzzy logic if:
    • Humans do not understand the system
    • Different experts disagree
    • Knowledge can not be expressed with verbal rules
    • Gut instinct is involved
      • Not just objective analysis
  • A fuzzy logic system is limited
    • Piece-wise linear approximation to a system
    • Non-linear systems can not be approximated
      • Many environment applications are non-linear

lakshman@ou.edu

neural networks
Neural Networks
  • Neural networks can approximate non-linear systems
    • Evidence-based
      • Weights chosen through optimization procedure on known dataset (“training”)
    • Works even if experts can’t verbalize their reasoning, or if there is ground truth

lakshman@ou.edu

a example neural network
A example neural network

Diagram from:

http://www.codeproject.com/useritems/GA_ANN_XOR.asp

lakshman@ou.edu

advantages of neural networks
Advantages of neural networks
  • Can approximate any smooth function
    • The three-layer neural network
  • Can yield true probabilities
    • If output node is a sigmoid node
  • Not hard to train
    • Training process is well understood
  • Fast in operations
    • Training is slow, but once trained, the network can calculate the output for a set of inputs quite fast
  • Easy to implement
    • Just a sum of exponential functions

lakshman@ou.edu

disadvantages of neural networks
Disadvantages of neural networks
  • A black box
    • The final set of weights yields no insights
    • Magnitude of weights doesn’t mean much
  • Measure of skill needs to be differentiable
    • RMS error, etc.
    • Can not use Probability of Detection, for example
  • Training set has to be complete
    • Unpredictable output on data unlike training
    • Need lots of data
    • Need expert willing to do lot of truthing

lakshman@ou.edu

recap
Recap:
  • Fuzzy logic
    • Humans provide the rules
    • Not optimal
  • Neural network
    • Humans can not understand system
    • Optimal
  • Middle ground?
    • Genetic Algorithms
    • Decision Trees

lakshman@ou.edu

genetic algorithms
Genetic algorithms
  • In genetic algorithms
    • One fixes the model (rule base, equations, class of functions, etc.)
    • Optimize the parameters to model on training data set
    • Use optimal set of parameters for unknown cases

lakshman@ou.edu

an example genetic algorithm
An example genetic algorithm

Sources:

http://tx.technion.ac.il/~edassau/web/genetic_algorithms.htm

http://cswww.essex.ac.uk/research/NEC/

lakshman@ou.edu

advantages of genetic algorithms
Advantages of genetic algorithms
  • Near-optimal parameters for given model
    • Human-understandable rules
    • Best parameters for them
  • Cost function need not be differentiable
    • The process of training uses natural selection, not gradient descent
  • Requires less data than a neural network
    • Search space is more limited

lakshman@ou.edu

disadvantages of genetic algorithms
Disadvantages of genetic algorithms
  • Highly dependent on class of functions
    • If poor model is chosen, poor results
      • Optimization may not help at all
  • Known model does not always lead to better understanding
    • Magnitude of weights, etc. may not be meaningful if inputs are correlated
    • Problem may have multiple parametric solutions

lakshman@ou.edu

decision trees
Decision trees
  • Can automatically build decision trees from known data
    • Prune trees
    • Select thresholds
    • Choose operators
  • Disadvantages
    • Piece-wise linear, so typically less skilled than neural networks
    • Large decision trees are effectively a blackbox
    • Can not do regression, only classification
  • Advantages:
    • Fast to train
    • New advances: bagged, boosted decision trees approach skill of neural networks, but are no longer fast to train

Root

30 50

T < 10C

20 15

T > 10C

10 35

Z > 45

18 2

Z < 45

2 13

V < 5

8 2

V > 5

2 33

lakshman@ou.edu

radial basis functions
Radial Basis Functions

Diagram from: A. W. Jayawardena & D. Achela K. Fernando 1998: Use of Radial Basis Function Type Artificial Neural Networks for Runoff Simulation, Computer-Aided Civil and Infrastructure Engineering 13:2

  • Radial Basis Functions are a form of neural network
    • Localized gaussians
    • Linear sum of non-linear functions
  • Advantage: Can be solved by inverting a matrix, so very fast
  • Disadvantage: Not a general-enough model

lakshman@ou.edu

data driven methods2
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

typical data driven application
Typical data-driven application

Input Data

Which features?

How do we find f()

Features

f(features)

AI application

in run-time

Result

lakshman@ou.edu

what is the role of the data
What is the role of the data?
  • Validation
    • Test known model
    • Technique:
      • Difference between model output and ground truth helps to validate the model
  • Calibration
    • Find parameters to model with desired structure
    • Technique:
      • Tuned fuzzy logic method
      • Genetic algorithms
  • Induction
    • Find model and parameters from just data
    • Technique:
      • Neural network methods, bagged/boosted decision trees, support vector machines, etc.

lakshman@ou.edu

what is the problem to solve
What is the problem to solve?
  • Do you have a bunch of data and want to:
    • Estimate an unknown parameter from it?
      • True rainfall based on radar observations?
      • Amount of liquid content from in-situ measurements of temperature, pressure, etc?
      • Regression
    • Classify what the data correspond to?
      • A water surge?
      • A temperature inversion?
      • A boundary?
      • Classification
  • Regression and classification aren’t that different
    • Classification: estimate probability of an event
      • A function from 0-1

lakshman@ou.edu

which ai technique
Which AI technique?
  • Do you have expert knowledge?
    • Humans have a “model” in their head? Should the final f() be understandable?
    • Create fuzzy logic rules from experts’ reasoning
      • Aggregate the individual fuzzy logic rules
      • Can tune the fuzzy rules based on data
        • Using regression, decision trees or neural networks for RMS error criterion
        • Genetic algorithms for error criteria like ROC, economic cost, etc.
      • Many times the original rules are just fine
  • Do you already know the model?
    • A power-law relationship? Gaussian? Quadratic? Rules?
    • Just need to find parameters to this model?
      • If linear, just use linear regression
      • If non-linear: use genetic algorithms
      • Use continuous GAs
  • Both of these can be used for regression (therefore, also classification)

lakshman@ou.edu

which ai technique contd
Which AI technique (contd.)
  • Do you know nothing about the data?
    • Not the suspected equation/model (GA)?
    • Not the suspected rules (fuzzy logic)?
    • Use a AI technique that supplies its equations/rules
      • “black box”.
  • For classification, use:
    • Bagged decision trees or Support Vector Machines
      • If output is probabilistic, remember to apply Platt scaling
      • Summary statistics on bagged DTs can help answer “why”
    • Neural Networks
  • For regression, use:
    • Neural networks

lakshman@ou.edu

where do your data come from
Where do your data come from?
  • Observed data
    • Compute features
    • Choose AI technique
      • The 4 choices in the previous two slides
  • Simulated data:
    • Example: trying to replicate a very complex model
    • Throw randomly-generated data at model
    • Compute features
    • Choose AI technique:
      • GA for parametric approximations
      • NN when you don’t know how to approximate

lakshman@ou.edu

where do you get your inputs
Where do you get your inputs?
  • What type of data do you have?
    • Individual observations?
      • Sample them (choose at random) and use directly
    • Sparse observations in a time series?
      • Generate time-based features (1D moving windows)
      • Signal processing features from time series
    • Data from remotely sensed 2D grids?
      • Generate image-based features using convolution filters
      • Do you need:
        • Pixel-based regression/classification?
          • Use convolution features directly
        • Object-based regression/classification?
          • Identify regions using region growing
          • Use region-aggregate features

lakshman@ou.edu

typical data driven application1
Typical data-driven application

Observed data

Signal/image processing;sampling

Features

normalize/create chromosome/

determine confidences

f()

FzLogic/GenAlg/NN/DecTree

Platt method/region-average/threshold

A data-driven application

in run-time

Result

lakshman@ou.edu

data driven methods3
Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

preprocessing
Preprocessing
  • Often can not use pixel data directly
    • Too much data, too highly correlated
    • May need to segment pixels into objects and use features computed on the objects
  • Different data sets may not be collocated
    • Need to interpolate to line them up
    • Mapping, objective analysis
  • Noise in data may need to be reduced
    • Smoothing
    • Present statistic of data, rather than data itself
  • Features need to be extracted from data
    • Human experts often good source of ideas on signatures to extract from data

lakshman@ou.edu

postprocessing
Postprocessing
  • The output of an expert system may be grid point by grid point
    • May need to provide output on objects
      • Storms, forests, etc.
    • Can average outputs over objects’ pixels
  • May need probabilistic output
    • Scale output of maximum marginal techniques
    • Use a sigmoid function
      • Called Platt scaling

lakshman@ou.edu

summary
Summary
  • What is Artificial Intelligence?
    • Data-driven methods to perform specific targeted tasks
  • Common AI techniques
    • Fuzzy logic, neural networks, genetic algorithms, decision trees
  • Choosing between AI techniques
    • Understand the role of your data
    • Do experts understand the system? (have a model)
    • Do experts expect to understand the system? (readability)
  • Pre and post processing
    • Image processing techniques on spatial grids

lakshman@ou.edu