Loading in 2 Seconds...

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques

Loading in 2 Seconds...

- By
**emily** - Follow User

- 250 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques' - emily

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Data-driven methods in Environmental SciencesExploration of Artificial Intelligence Techniques

Data Driven Methods

Valliappa.Lakshmanan@noaa.gov

lakshman@ou.edu

Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

What is AI?

- Machines that perceive, understand and react to their environment
- Goal of Babbage, etc.
- Oldest endeavor in computer science
- Machines that think
- Robots: factory floors, home vacuums
- Still quite impractical

lakshman@ou.edu

AI vs. humans

- AI applications built on Aristotlean logic
- Induction, semantic queries, system of logic
- Human reasoning involves more than just induction
- Computers never as good as humans
- In reasoning and making sense of data
- In obtaining a holistic view of a system
- Computers much better than humans
- In processing reams of data
- Performing complex calculations

lakshman@ou.edu

Successful AI applications

- Targeted tasks more amenable to automated methods
- Build special-purpose AI systems
- Determine appropriate dosage for a drug
- Classify cells as benign or cancerous
- Called “expert systems”
- Methodology based on expert reasoning
- Quick and objective ways to obtain answers

lakshman@ou.edu

Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

Fuzzy logic

- Fuzzy logic addresses key problem in expert systems
- How to represent domain knowledge
- Humans use imprecisely calibrated terms
- How to build decision trees on imprecise thresholds

lakshman@ou.edu

Fuzzy logic example

Source: Matlab fuzzy logic toolbox tutorial

http://www.mathworks.com/access/helpdesk/help/toolbox/fuzzy/fp350.html

lakshman@ou.edu

Advantages of fuzzy logic

- Considerable skill for little investment
- Fuzzy logic systems piggy bank on human analysis
- Humans encode rules after intelligent analysis of lots of data
- Verbal rules generated by humans are robust
- Simple to create
- Not much need for data or ground truth
- Logic tends to be easy to program
- Fuzzy rules are human understandable

lakshman@ou.edu

Where not to use fuzzy logic

- Do not use fuzzy logic if:
- Humans do not understand the system
- Different experts disagree
- Knowledge can not be expressed with verbal rules
- Gut instinct is involved
- Not just objective analysis
- A fuzzy logic system is limited
- Piece-wise linear approximation to a system
- Non-linear systems can not be approximated
- Many environment applications are non-linear

lakshman@ou.edu

Neural Networks

- Neural networks can approximate non-linear systems
- Evidence-based
- Weights chosen through optimization procedure on known dataset (“training”)
- Works even if experts can’t verbalize their reasoning, or if there is ground truth

lakshman@ou.edu

A example neural network

Diagram from:

http://www.codeproject.com/useritems/GA_ANN_XOR.asp

lakshman@ou.edu

Advantages of neural networks

- Can approximate any smooth function
- The three-layer neural network
- Can yield true probabilities
- If output node is a sigmoid node
- Not hard to train
- Training process is well understood
- Fast in operations
- Training is slow, but once trained, the network can calculate the output for a set of inputs quite fast
- Easy to implement
- Just a sum of exponential functions

lakshman@ou.edu

Disadvantages of neural networks

- A black box
- The final set of weights yields no insights
- Magnitude of weights doesn’t mean much
- Measure of skill needs to be differentiable
- RMS error, etc.
- Can not use Probability of Detection, for example
- Training set has to be complete
- Unpredictable output on data unlike training
- Need lots of data
- Need expert willing to do lot of truthing

lakshman@ou.edu

Recap:

- Fuzzy logic
- Humans provide the rules
- Not optimal
- Neural network
- Humans can not understand system
- Optimal
- Middle ground?
- Genetic Algorithms
- Decision Trees

lakshman@ou.edu

Genetic algorithms

- In genetic algorithms
- One fixes the model (rule base, equations, class of functions, etc.)
- Optimize the parameters to model on training data set
- Use optimal set of parameters for unknown cases

lakshman@ou.edu

An example genetic algorithm

Sources:

http://tx.technion.ac.il/~edassau/web/genetic_algorithms.htm

http://cswww.essex.ac.uk/research/NEC/

lakshman@ou.edu

Advantages of genetic algorithms

- Near-optimal parameters for given model
- Human-understandable rules
- Best parameters for them
- Cost function need not be differentiable
- The process of training uses natural selection, not gradient descent
- Requires less data than a neural network
- Search space is more limited

lakshman@ou.edu

Disadvantages of genetic algorithms

- Highly dependent on class of functions
- If poor model is chosen, poor results
- Optimization may not help at all
- Known model does not always lead to better understanding
- Magnitude of weights, etc. may not be meaningful if inputs are correlated
- Problem may have multiple parametric solutions

lakshman@ou.edu

Decision trees

- Can automatically build decision trees from known data
- Prune trees
- Select thresholds
- Choose operators
- Disadvantages
- Piece-wise linear, so typically less skilled than neural networks
- Large decision trees are effectively a blackbox
- Can not do regression, only classification
- Advantages:
- Fast to train
- New advances: bagged, boosted decision trees approach skill of neural networks, but are no longer fast to train

Root

30 50

T < 10C

20 15

T > 10C

10 35

Z > 45

18 2

Z < 45

2 13

V < 5

8 2

V > 5

2 33

lakshman@ou.edu

Radial Basis Functions

Diagram from: A. W. Jayawardena & D. Achela K. Fernando 1998: Use of Radial Basis Function Type Artificial Neural Networks for Runoff Simulation, Computer-Aided Civil and Infrastructure Engineering 13:2

- Radial Basis Functions are a form of neural network
- Localized gaussians
- Linear sum of non-linear functions
- Advantage: Can be solved by inverting a matrix, so very fast
- Disadvantage: Not a general-enough model

lakshman@ou.edu

Data Driven Methods

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

Typical data-driven application

Input Data

Which features?

How do we find f()

Features

f(features)

AI application

in run-time

Result

lakshman@ou.edu

What is the role of the data?

- Validation
- Test known model
- Technique:
- Difference between model output and ground truth helps to validate the model
- Calibration
- Find parameters to model with desired structure
- Technique:
- Tuned fuzzy logic method
- Genetic algorithms
- Induction
- Find model and parameters from just data
- Technique:
- Neural network methods, bagged/boosted decision trees, support vector machines, etc.

lakshman@ou.edu

What is the problem to solve?

- Do you have a bunch of data and want to:
- Estimate an unknown parameter from it?
- True rainfall based on radar observations?
- Amount of liquid content from in-situ measurements of temperature, pressure, etc?
- Regression
- Classify what the data correspond to?
- A water surge?
- A temperature inversion?
- A boundary?
- Classification
- Regression and classification aren’t that different
- Classification: estimate probability of an event
- A function from 0-1

lakshman@ou.edu

Which AI technique?

- Do you have expert knowledge?
- Humans have a “model” in their head? Should the final f() be understandable?
- Create fuzzy logic rules from experts’ reasoning
- Aggregate the individual fuzzy logic rules
- Can tune the fuzzy rules based on data
- Using regression, decision trees or neural networks for RMS error criterion
- Genetic algorithms for error criteria like ROC, economic cost, etc.
- Many times the original rules are just fine
- Do you already know the model?
- A power-law relationship? Gaussian? Quadratic? Rules?
- Just need to find parameters to this model?
- If linear, just use linear regression
- If non-linear: use genetic algorithms
- Use continuous GAs
- Both of these can be used for regression (therefore, also classification)

lakshman@ou.edu

Which AI technique (contd.)

- Do you know nothing about the data?
- Not the suspected equation/model (GA)?
- Not the suspected rules (fuzzy logic)?
- Use a AI technique that supplies its equations/rules
- “black box”.
- For classification, use:
- Bagged decision trees or Support Vector Machines
- If output is probabilistic, remember to apply Platt scaling
- Summary statistics on bagged DTs can help answer “why”
- Neural Networks
- For regression, use:
- Neural networks

lakshman@ou.edu

Where do your data come from?

- Observed data
- Compute features
- Choose AI technique
- The 4 choices in the previous two slides
- Simulated data:
- Example: trying to replicate a very complex model
- Throw randomly-generated data at model
- Compute features
- Choose AI technique:
- GA for parametric approximations
- NN when you don’t know how to approximate

lakshman@ou.edu

Where do you get your inputs?

- What type of data do you have?
- Individual observations?
- Sample them (choose at random) and use directly
- Sparse observations in a time series?
- Generate time-based features (1D moving windows)
- Signal processing features from time series
- Data from remotely sensed 2D grids?
- Generate image-based features using convolution filters
- Do you need:
- Pixel-based regression/classification?
- Use convolution features directly
- Object-based regression/classification?
- Identify regions using region growing
- Use region-aggregate features

lakshman@ou.edu

Typical data-driven application

Observed data

Signal/image processing;sampling

Features

normalize/create chromosome/

determine confidences

f()

FzLogic/GenAlg/NN/DecTree

Platt method/region-average/threshold

A data-driven application

in run-time

Result

lakshman@ou.edu

What is Artificial Intelligence?

Common AI techniques

Choosing between AI techniques

Pre and post processing

lakshman@ou.edu

Preprocessing

- Often can not use pixel data directly
- Too much data, too highly correlated
- May need to segment pixels into objects and use features computed on the objects
- Different data sets may not be collocated
- Need to interpolate to line them up
- Mapping, objective analysis
- Noise in data may need to be reduced
- Smoothing
- Present statistic of data, rather than data itself
- Features need to be extracted from data
- Human experts often good source of ideas on signatures to extract from data

lakshman@ou.edu

Postprocessing

- The output of an expert system may be grid point by grid point
- May need to provide output on objects
- Storms, forests, etc.
- Can average outputs over objects’ pixels
- May need probabilistic output
- Scale output of maximum marginal techniques
- Use a sigmoid function
- Called Platt scaling

lakshman@ou.edu

Summary

- What is Artificial Intelligence?
- Data-driven methods to perform specific targeted tasks
- Common AI techniques
- Fuzzy logic, neural networks, genetic algorithms, decision trees
- Choosing between AI techniques
- Understand the role of your data
- Do experts understand the system? (have a model)
- Do experts expect to understand the system? (readability)
- Pre and post processing
- Image processing techniques on spatial grids

lakshman@ou.edu

Download Presentation

Connecting to Server..