Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 59

Machine Learning and Neural Networks Professor Tony Martinez Computer Science Department Brigham Young University http://axon.cs.byu.edu/~martinez PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Machine Learning and Neural Networks Professor Tony Martinez Computer Science Department Brigham Young University http://axon.cs.byu.edu/~martinez. Tutorial Overview. Introduction and Motivation Neural Network Model Descriptions Perceptron Backpropagation Issues Overfitting Applications

Download Presentation

Machine Learning and Neural Networks Professor Tony Martinez Computer Science Department Brigham Young University http://axon.cs.byu.edu/~martinez

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

Machine Learning and Neural NetworksProfessor Tony MartinezComputer Science DepartmentBrigham Young Universityhttp://axon.cs.byu.edu/~martinez

Machine Learning Tutorial – UIST 2002


Tutorial overview l.jpg

Tutorial Overview

  • Introduction and Motivation

  • Neural Network Model Descriptions

    • Perceptron

    • Backpropagation

  • Issues

    • Overfitting

    • Applications

  • Other Models

    • Decision Trees, Nearest Neighbor/IBL, Genetic Algorithms, Rule Induction, Ensembles

Machine Learning Tutorial – UIST 2002


More information l.jpg

More Information

  • You can download this presentation from:

    ftp://axon.cs.byu.edu/pub/papers/NNML.ppt

  • An excellent introductory text to Machine Learning:

    Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

Machine Learning Tutorial – UIST 2002


What is inductive learning l.jpg

What is Inductive Learning

  • Gather a set of input-output examples from some application: Training Set

    i.e. Speech Recognition, financial forecasting

  • Train the learning model (Neural network, etc.) on the training set until it solves it well

  • The Goal is to generalize on novel data not yet seen

  • Gather a further set of input-output examples from the same application: Test Set

  • Use the learning system on actual data

Machine Learning Tutorial – UIST 2002


Motivation l.jpg

Motivation

  • Costs and Errors in Programming

  • Our inability to program "subjective" problems

  • General, easy-to use mechanism for a large set of applications

  • Improvement in application accuracy - Empirical

Machine Learning Tutorial – UIST 2002


Example application heart attack diagnosis l.jpg

Example Application - Heart Attack Diagnosis

  • The patient has a set of symptoms - Age, type of pain, heart rate, blood pressure, temperature, etc.

  • Given these symptoms in an Emergency Room setting, a doctor must diagnose whether a heart attack has occurred.

  • How do you train a machine learning model solve this problem using the inductive learning model?

  • Consistent approach

  • Knowledge of ML approach not critical

  • Need to select a reasonable set of input features

Machine Learning Tutorial – UIST 2002


Examples and discussion l.jpg

Examples and Discussion

  • Loan Underwriting

    • Which Input Features (Data)

    • Divide into Training Set and Test Set

    • Choose a learning model

    • Train model on Training set

    • Predict accuracy with the Test Set

    • How to generalize better?

      • Different Input Features

      • Different Learning Model

    • Issues

      • Intuition vs. Prejudice

      • Social Response

Machine Learning Tutorial – UIST 2002


Uc irvine machine learning data base iris data set l.jpg

UC Irvine Machine Learning Data BaseIris Data Set

4.8,3.0,1.4,0.3,Iris-setosa

5.1,3.8,1.6,0.2,Iris-setosa

4.6,3.2,1.4,0.2,Iris-setosa

5.3,3.7,1.5,0.2,Iris-setosa

5.0,3.3,1.4,0.2,Iris-setosa

7.0,3.2,4.7,1.4,Iris-versicolor

6.4,3.2,4.5,1.5,Iris-versicolor

6.9,3.1,4.9,1.5,Iris-versicolor

5.5,2.3,4.0,1.3,Iris-versicolor

6.5,2.8,4.6,1.5,Iris-versicolor

6.0,2.2,5.0,1.5,Iris-viginica

6.9,3.2,5.7,2.3,Iris-viginica

5.6,2.8,4.9,2.0,Iris-viginica

7.7,2.8,6.7,2.0,Iris-viginica

6.3,2.7,4.9,1.8,Iris-viginica

Machine Learning Tutorial – UIST 2002


Voting records data base l.jpg

Voting Records Data Base

democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y

democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y

republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,?,y

republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y

democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,?,?

republican,n,y,n,y,y,n,n,n,n,n,?,?,y,y,n,n

republican,n,y,n,y,y,y,n,n,n,n,y,?,y,y,?,?

democrat,n,y,y,n,n,n,y,y,y,n,n,n,y,n,?,?

democrat,y,y,y,n,n,y,y,y,?,y,y,?,n,n,y,?

republican,n,y,n,y,y,y,n,n,n,n,n,y,?,?,n,?

republican,n,y,n,y,y,y,n,n,n,y,n,y,y,?,n,?

democrat,y,n,y,n,n,y,n,y,?,y,y,y,?,n,n,y

democrat,y,?,y,n,n,n,y,y,y,n,n,n,y,n,y,y

republican,n,y,n,y,y,y,n,n,n,n,n,?,y,y,n,n

Machine Learning Tutorial – UIST 2002


Machine learning sketch history l.jpg

Machine Learning Sketch History

  • Neural Networks - Connectionist - Biological Plausibility

    • Late 50’s, early 60’s, Rosenblatt, Perceptron

    • Minsky & Papert 1969 - The Lull, symbolic expansion

    • Late 80’s - Backpropagation, Hopfield, etc. - The explosion

  • Machine Learning - Artificial Intelligence - Symbolic - Psychological Plausibility

    • Samuel (1959) - Checkers evaluation strategies

    • 1970’s and on - ID3, Instance Based Learning, Rule induction, …

    • Currently – Symbolic and connectionist lumped under ML

  • Genetic Algorithms - 1970’s

    • Originally lumped in connectionist

    • Now an exploding area – Evolutionary Algorithms

Machine Learning Tutorial – UIST 2002


Inductive learning supervised l.jpg

Inductive Learning - Supervised

  • Assume a set T of examples of the form (x,y) where x is a vector of features/attributes and y is a scalar or vector output

  • By examining the examples postulate a hypothesis H(x) => y for arbitrary x

  • Spectrum of Supervised Algorithms

    • Unsupervised Learning

    • Reinforcement Learning

Machine Learning Tutorial – UIST 2002


Other machine learning areas l.jpg

Other Machine Learning Areas

  • Case Based Reasoning

  • Analogical Reasoning

  • Speed-up Learning

  • Inductive Learning is the most studied and successful to date

  • Data Mining

  • COLT – Computational Learning Theory

Machine Learning Tutorial – UIST 2002


Slide13 l.jpg

Machine Learning Tutorial – UIST 2002


Perceptron node threshold logic unit l.jpg

Perceptron Node – Threshold Logic Unit

x1

w1

x2

Z

w2

xn

wn

Machine Learning Tutorial – UIST 2002


Learning algorithm l.jpg

x2

x2

T

.8

.3

1

.4

.1

0

Learning Algorithm

x1

.4

Z

.1

x2

-.2

Machine Learning Tutorial – UIST 2002


First training instance l.jpg

.8

.3

x2

x2

T

.8

.3

1

.4

.1

0

First Training Instance

.4

Z

=1

.1

-.2

Net = .8*.4 + .3*-.2 = .26

Machine Learning Tutorial – UIST 2002


Second training instance l.jpg

.4

.1

x2

x2

T

.8

.3

1

.4

.1

0

Second Training Instance

.4

Z

=1

.1

-.2

Net = .4*.4 + .1*-.2 = .14

Dwi =

(T - Z)

* C

* Xi

Machine Learning Tutorial – UIST 2002


Delta rule learning l.jpg

Delta Rule Learning

Dwij = C(Tj – Zj)xi

  • Create a network with n input and m output nodes

  • Each iteration through the training set is an epoch

  • Continue training until error is less than some epsilon

  • Perceptron Convergence Theorem: Guaranteed to find a solution in finite time if a solution exists

  • As can be seen from the node activation function the decision surface is an n-dimensional hyper plane

Machine Learning Tutorial – UIST 2002


Linear separability l.jpg

Linear Separability

Machine Learning Tutorial – UIST 2002


Linear separability and generalization l.jpg

Linear Separability and Generalization

When is data noise vs. a legitimate exception

Machine Learning Tutorial – UIST 2002


Limited functionality of hyperplane l.jpg

Limited Functionality of Hyperplane

Machine Learning Tutorial – UIST 2002


Gradient descent learning l.jpg

Gradient Descent Learning

Error Landscape

TSS:

Total

Sum Squared

Error

0

Weight Values

Machine Learning Tutorial – UIST 2002


Deriving a gradient descent learning algorithm l.jpg

Deriving a Gradient Descent Learning Algorithm

  • Goal to decrease overall error (or other objective function) each time a weight is changed

  • Total Sum Squared error = S (Ti – Zi)2

  • Seek a weight changing algorithm such that is negative

  • If a formula can be found then we have a gradient descent learning algorithm

  • Perceptron/Delta rule is a gradient descent learning algorithm

  • Linearly-separable problems have no local minima

Machine Learning Tutorial – UIST 2002


Multi layer perceptron l.jpg

Multi-layer Perceptron

  • Can compute arbitrary mappings

  • Assumes a non-linear activation function

  • Training Algorithms less obvious

  • Backpropagation learning algorithm not exploited until 1980’s

  • First of many powerful multi-layer learning algorithms

Machine Learning Tutorial – UIST 2002


Responsibility problem l.jpg

Responsibility Problem

Output 1

Wanted 0

Machine Learning Tutorial – UIST 2002


Multi layer generalization l.jpg

Multi-Layer Generalization

Machine Learning Tutorial – UIST 2002


Backpropagation l.jpg

Backpropagation

  • Multi-layer supervised learner

  • Gradient Descent weight updates

  • Sigmoid activation function (smoothed threshold logic)

  • Backpropagation requires a differentiable activation function

Machine Learning Tutorial – UIST 2002


Multi layer perceptron topology l.jpg

Multi-layer Perceptron Topology

Input Layer Hidden Layer(s) Output Layer

Machine Learning Tutorial – UIST 2002


Backpropagation learning algorithm l.jpg

Backpropagation Learning Algorithm

  • Until Convergence (low error or other criteria) do

    • Present a training pattern

    • Calculate the error of the output nodes (based on T - Z)

    • Calculate the error of the hidden nodes (based on the error of the output nodes which is propagated back to the hidden nodes)

    • Continue propagating error back until the input layer is reached

    • Update all weights based on the standard delta rule with the appropriate error function d

      Dwij = CdjZi

Machine Learning Tutorial – UIST 2002


Activation function and its derivative l.jpg

Activation Function and its Derivative

  • Node activation function f(net) is typically the sigmoid

  • Derivate of activation function is critical part of algorithm

1

.5

0

-5

0

5

Net

.25

0

-5

5

0

Net

Machine Learning Tutorial – UIST 2002


Backpropagation learning equations l.jpg

i

k

i

j

k

i

k

i

Backpropagation Learning Equations

Machine Learning Tutorial – UIST 2002


Backpropagation summary l.jpg

Backpropagation Summary

  • Excellent Empirical results

  • Scaling – The pleasant surprise

    • Local Minima very rare is problem and network complexity increase

  • Most common neural network approach

  • User defined parameters lead to more difficulty of use

    • Number of hidden nodes, layers, learning rate, etc.

  • Many variants

    • Adaptive Parameters, Ontogenic (growing and pruning) learning algorithms

    • Higher order gradient descent (Newton, Conjugate Gradient, etc.)

    • Recurrent networks

Machine Learning Tutorial – UIST 2002


Inductive bias l.jpg

Inductive Bias

  • The approach used to decide how to generalize novel cases

  • Occam’s Razor – The simplest hypothesis which fits the data is usually the best – Still many remaining options

    A B C -> Z

    A B’ C -> Z

    A B C’ -> Z

    A B’ C’ -> Z

    A’ B’ C’ -> Z’

  • Now you receive the new input A’ B C What is your output?

Machine Learning Tutorial – UIST 2002


Overfitting l.jpg

Overfitting

Noise vs. Exceptions revisited

Machine Learning Tutorial – UIST 2002


The overfit problem l.jpg

The Overfit Problem

  • Newer powerful models can have very complex decision surfaces which can converge well on most training sets by learning noisy and irrelevant aspects of the training set in order to minimize error (memorization in the limit)

  • This makes them susceptible to overfit if not carefully considered

TSS

Validation/Test Set

Training Set

Epochs

Machine Learning Tutorial – UIST 2002


Avoiding overfit l.jpg

Avoiding Overfit

  • Inductive Bias – Simplest accurate model

  • More Training Data (vs. overtraining - One epoch limit)

  • Validation Set (requires separate test set)

  • Backpropagation – Tends to build from simple model (0 weights) to just large enough weights (Validation Set)

  • Stopping criteria with any constructive model (Accuracy increase vs Statistical significance) – Noise vs. Exceptions

  • Specific Techniques

    • Weight Decay, Pruning, Jitter, Regularization

  • Ensembles

Machine Learning Tutorial – UIST 2002


Ensembles l.jpg

Ensembles

  • Many different Ensemble approaches

    • Stacking, Gating/Mixture of Experts, Bagging, Boosting, Wagging, Mimicking, Combinations

  • Multiple diverse models trained on same problem and then their outputs are combined

  • The specific overfit of each learning model is averaged out

  • If models are diverse (uncorrelated errors) then even if the individual models are weak generalizers, the ensemble can be very accurate

Combining Technique

M1

M2

M3

Mn

Machine Learning Tutorial – UIST 2002


Application issues l.jpg

Application Issues

  • Choose relevant features

  • Normalize features

  • Can learn to ignore irrelevant features, but will have to fight the curse of dimensionality

  • More data (training examples) the better

  • Slower training acceptable for complex and production applications if accuracy improvement, (“The week phenomenon”)

  • Execution normally fast regardless of training time

Machine Learning Tutorial – UIST 2002


Decision trees id3 c4 5 l.jpg

Decision Trees - ID3/C4.5

  • Top down induction of decision trees

  • Highly used and successful

  • Attribute Features - discrete nominal (mutually exclusive) – Real valued features are discretized

  • Search for smallest tree is too complex (always NP hard)

  • C4.5 use common symbolic ML philosophy of a greedy iterative approach

Machine Learning Tutorial – UIST 2002


Decision tree learning l.jpg

Decision Tree Learning

  • Mapping by Hyper-Rectangles

A1

A2

Machine Learning Tutorial – UIST 2002


Id3 learning approach l.jpg

ID3 Learning Approach

  • C is the current set of examples

  • A test on attribute A partitions C into {Ci, C2,...,Cw} where w is the number of values of A

C

Red

Green

Attribute:Color

Purple

C1

C2

C3

Machine Learning Tutorial – UIST 2002


Decision tree learning algorithm l.jpg

Decision Tree Learning Algorithm

  • Start with the Training Set as C and test how each attribute partitions C

  • Choose the bestA for root

  • The goodness measure is based on how well attribute A divides C into different output classes – A perfect attribute would divide C into partitions that contain only one output class each – A poor attribute (irrelevant) would leave each partition with the same ratio of classes as in C

  • 20 questions analogy – good questions quickly minimize the possibilities

  • Continue recursively until sets unambiguously classified or a stopping criteria is reached

Machine Learning Tutorial – UIST 2002


Id3 example and discussion l.jpg

ID3 Example and Discussion

  • 14 Examples. Uses Information Gain. Attributes which best discriminate between classes are chosen

  • If the same ratios are found in partitioned set, then gain is 0

TemperatureHumidity

  • PNPN

  • Hot22High34

  • Mild42Normal61

  • Cool31

    • Gain: .029Gain: .151

  • Machine Learning Tutorial – UIST 2002


    Id3 conclusions l.jpg

    ID3 - Conclusions

    • Good Empirical Results

    • Comparable application robustness and accuracy with neural networks - faster learning (though NNs are more natural with continuous features - both input and output)

    • Most used and well known of current symbolic systems - used widely to aid in creating rules for expert systems

    Machine Learning Tutorial – UIST 2002


    Nearest neighbor learners l.jpg

    Nearest Neighbor Learners

    • Broad Spectrum

      • Basic K-NN, Instance Based Learning, Case Based Reasoning, Analogical Reasoning

    • Simply store all or some representative subset of the examples in the training set

    • Generalize on the fly rather than use pre-acquired hypothesis - faster learning, slower execution, information retained, memory intensive

    Machine Learning Tutorial – UIST 2002


    Nearest neighbor algorithms l.jpg

    Nearest Neighbor Algorithms

    Machine Learning Tutorial – UIST 2002


    Nearest neighbor variations l.jpg

    Nearest Neighbor Variations

    • How many examples to store

    • How do stored example vote (distance weighted, etc.)

    • Can we choose a smaller set of near-optimal examples (prototypes/exemplars)

      • Storage reduction

      • Faster execution

      • Noise robustness

    • Distance Metrics – non-Euclidean

    • Irrelevant Features – Feature weighting

    Machine Learning Tutorial – UIST 2002


    Evolutionary computation algorithms genetic algorithms l.jpg

    Evolutionary Computation/AlgorithmsGenetic Algorithms

    • Simulate “natural” evolution of structures via selection and reproduction, based on performance (fitness)

    • Type of Heuristic Search - Discovery, not inductive in isolation

    • Genetic Operators - Recombination (Crossover) and Mutation are most common

      1 1 0 2 3 1 0 2 2 1 (Fitness = 10)

      2 2 0 1 1 3 1 1 0 0 (Fitness = 12)

      2 2 0 1 3 1 0 2 2 1 (Fitness = calculated or f(parents))

    Machine Learning Tutorial – UIST 2002


    Evolutionary algorithms l.jpg

    Evolutionary Algorithms

    • Start with initialized population P(t) - random, domain- knowledge, etc.

    • Population usually made up of possible parameter settings for a complex problem

    • Typically have fixed population size (like beam search)

    • Selection

      • Parent_Selection P(t) - Promising Parents used to create new children

      • Survive P(t) - Pruning of unpromising candidates

    • Evaluate P(t) - Calculate fitness of population members. Ranges from simple metrics to complex simulations.

    Machine Learning Tutorial – UIST 2002


    Evolutionary algorithm l.jpg

    Evolutionary Algorithm

    Procedure EA

    t = 0;

    Initialize Population P(t);

    Evaluate P(t);

    Until Done{ /*Sufficiently “good” individuals discovered*/

    t = t+1;

    Parent_Selection P(t);

    Recombine P(t);

    Mutate P(t);

    Evaluate P(t);

    Survive P(t);}

    Machine Learning Tutorial – UIST 2002


    Ea example l.jpg

    EA Example

    • Goal: Discover a new automotive engine to maximize performance, reliability, and mileage while minimizing emissions

    • Features: CID (Cubic inch displacement), fuel system, # of valves, # of cylinders, presence of turbo-charging

    • Assume - Test unit which tests possible engines and returns integer measure of goodness

    • Start with population of random engines

    Machine Learning Tutorial – UIST 2002


    Slide52 l.jpg

    Machine Learning Tutorial – UIST 2002


    Slide53 l.jpg

    Machine Learning Tutorial – UIST 2002


    Genetic operators l.jpg

    Genetic Operators

    • Crossover variations - multi-point, uniform probability, averaging, etc.

    • Mutation - Random changes in features, adaptive, different for each feature, etc.

    • Others - many schemes mimicking natural genetics: dominance, selective mating, inversion, reordering, speciation, knowledge-based, etc.

    • Reproduction - terminology - selection based on fitness - keep best around - supported in the algorithms

    • Critical to maintain balance of diversity and quality in the population

    Machine Learning Tutorial – UIST 2002


    Evolutionary algorithms55 l.jpg

    Evolutionary Algorithms

    • There exist mathematical proofs that evolutionary techniques are efficient search strategies

    • There are a number of different Evolutionary strategies

      • Genetic Algorithms

      • Evolutionary Programming

      • Evolution Strategies

      • Genetic Programming

    • Strategies differ in representations, selection, operators, evaluation, etc.

    • Most independently discovered, initially function optimization (EP, ES)

    • Strategies continue to “evolve”

    Machine Learning Tutorial – UIST 2002


    Genetic algorithm comments l.jpg

    Genetic Algorithm Comments

    • Much current work and extensions

    • Numerous application attempts. Can plug into many algorithms requiring search. Has built-in heuristic. Could augment with domain heuristics

    • “Lazy Man’s Solution” to any tough parameter search

    Machine Learning Tutorial – UIST 2002


    Rule induction l.jpg

    Rule Induction

    • Creates a set of symbolic rules to solve a classification problem

    • Sequential Covering Algorithms

    • Until no good and significant rules can be created

      • Create all first order rules Ax -> Classy

      • Score each rule based on goodness (accuracy) and significance using the current training set

      • Iteratively (greedily) expand the best rules to n+1 attributes, score the new rules, and prune weak rules to keep the total candidate list at a fixed size (beam search)

      • Pick the one best rule and remove all instances from the training set that the rule covers

    Machine Learning Tutorial – UIST 2002


    Rule induction variants l.jpg

    Rule Induction Variants

    • Ordered Rule lists (decision lists) - naturally supports multiple output classes

      • A=Green and B=Tall -> Class 1

      • A=Red and C=Fast -> Class 2

      • Else Class 1

    • Placing new rules at beginning or end of list

    • Unordered rule lists for each output class (must handle multiple matches)

    • Rule induction can handle noise by no longer creating new rules when gain is negligible or not statistically significant

    Machine Learning Tutorial – UIST 2002


    Conclusion l.jpg

    Conclusion

    • Many new algorithms and approaches being proposed

    • Application areas rapidly increasing

      • Amount of available data and information growing

      • User desire for more adaptive and user-specific computer interaction

      • This need for specific and adaptable user interaction will make machine learning a more important tool in user interface research and applications

    Machine Learning Tutorial – UIST 2002


  • Login