learning with neural networks
Download
Skip this Video
Download Presentation
Learning with Neural Networks

Loading in 2 Seconds...

play fullscreen
1 / 24

Learning with Neural Networks - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Learning with Neural Networks. Artificial Intelligence CMSC 25000 February 19, 2002. Agenda. Neural Networks: Biological analogy Review: single-layer perceptrons Perceptron: Pros & Cons Neural Networks: Multilayer perceptrons Neural net training: Backpropagation Strengths & Limitations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning with Neural Networks' - misae


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning with neural networks

Learning with Neural Networks

Artificial Intelligence

CMSC 25000

February 19, 2002

agenda
Agenda
  • Neural Networks:
    • Biological analogy
  • Review: single-layer perceptrons
      • Perceptron: Pros & Cons
  • Neural Networks: Multilayer perceptrons
      • Neural net training: Backpropagation
      • Strengths & Limitations
  • Conclusions
neurons the concept
Neurons: The Concept

Dendrites

Axon

Nucleus

Cell Body

Neurons: Receive inputs from other neurons (via synapses)

When input exceeds threshold, “fires”

Sends output along axon to other neurons

Brain: 10^11 neurons, 10^16 synapses

perceptron structure
Perceptron Structure

Single neuron-like element

-Binary inputs &output

-Weighted sum of inputs > threshold

y

w0

wn

w1

w3

w2

x0=-1

x1

x2

x3

xn

. . .

  • Until perceptron correct output for all
    • If the perceptron is correct, do nothing
    • If the percepton is wrong,
    • If it incorrectly says “yes”,
    • Subtract input vector from weight vector
    • Otherwise, add input vector to it

compensates for threshold

x0 w0

perceptron learning

x2

0

0

0

0

+ +++ + +

0

0

0

x1

Perceptron Learning
  • Perceptrons learn linear decision boundaries
  • E.g.
  • Guaranteed to converge, if linearly separable
  • Many simple functions NOT learnable

x2

+

0

But not

0

+

x1

xor

neural nets
Neural Nets
  • Multi-layer perceptrons
    • Inputs: real-valued
    • Intermediate “hidden” nodes
    • Output(s): one (or more) discrete-valued

X1

Y1

Y2

X2

X3

X4

Inputs

Hidden

Hidden

Outputs

neural nets7
Neural Nets
  • Pro: More general than perceptrons
    • Not restricted to linear discriminants
    • Multiple outputs: one classification each
  • Con: No simple, guaranteed training procedure
    • Use greedy, hill-climbing procedure to train
    • “Gradient descent”, “Backpropagation”
solving the xor problem
Solving the XOR Problem

o1

w11

Network

Topology:

2 hidden nodes

1 output

w13

x1

w01

w21

y

-1

w23

w12

w03

w22

x2

-1

w02

o2

Desired behavior:

x1 x2 o1 o2 y

0 0 0 0 0

1 0 0 1 1

0 1 0 1 1

1 1 1 1 0

-1

Weights:

w11= w12=1

w21=w22 = 1

w01=3/2; w02=1/2; w03=1/2

w13=-1; w23=1

backpropagation
Backpropagation
  • Greedy, Hill-climbing procedure
    • Weights are parameters to change
    • Original hill-climb changes one parameter/step
      • Slow
    • If smooth function, change all parameters/step
      • Gradient descent
        • Backpropagation: Computes current output, works backward to correct error
producing a smooth function
Producing a Smooth Function
  • Key problem:
    • Pure step threshold is discontinuous
      • Not differentiable
  • Solution:
    • Sigmoid (squashed ‘s’ function): Logistic fn
neural net training
Neural Net Training
  • Goal:
    • Determine how to change weights to get correct output
      • Large change in weight to produce large reduction in error
  • Approach:
      • Compute actual output: o
      • Compare to desired output: d
      • Determine effect of each weight w on error = d-o
      • Adjust weights
neural net example

z1

z2

z3

y3

z3

w03

-1

w23

w13

y1

y2

z2

z1

w21

w01

w22

w02

w11

-1

w12

-1

x2

x1

Neural Net Example

xi : ith sample input vector

w : weight vector

yi*: desired output for ith sample

-

Sum of squares error over training samples

From 6.034 notes lozano-perez

Full expression of output in terms of input and weights

gradient descent
Gradient Descent
  • Error: Sum of squares error of inputs with current weights
  • Compute rate of change of error wrt each weight
    • Which weights have greatest effect on error?
    • Effectively, partial derivatives of error wrt weights
      • In turn, depend on other weights => chain rule
gradient descent14
E = G(w)

Error as function of weights

Find rate of change of error

Follow steepest rate of change

Change weights s.t. error is minimized

Gradient Descent

dG

dw

E

G(w)

w0w1

w

Local

minima

gradient of error

z1

z2

z3

y3

z3

w03

-1

w23

w13

y1

y2

z2

z1

w21

w01

w22

w02

w11

-1

w12

-1

x2

x1

Gradient of Error

-

Note: Derivative of sigmoid:

ds(z1) = s(z1)(1-s(z1))

dz1

From 6.034 notes lozano-perez

MIT AI lecture notes, Lozano-Perez 2000

from effect to update
From Effect to Update
  • Gradient computation:
    • How each weight contributes to performance
  • To train:
    • Need to determine how to CHANGE weight based on contribution to performance
    • Need to determine how MUCH change to make per iteration
      • Rate parameter ‘r’
        • Large enough to learn quickly
        • Small enough reach but not overshoot target values
backpropagation procedure
Backpropagation Procedure

i

j

k

  • Pick rate parameter ‘r’
  • Until performance is good enough,
    • Do forward computation to calculate output
    • Compute Beta in output node with
    • Compute Beta in all other nodes with
    • Compute change for all weights with
backprop example

y3

z3

w03

-1

w13

y1

w23

y2

z2

z1

w21

w01

w22

w02

-1

w11

w12

-1

x2

x1

Backprop Example

Forward prop: Compute zi and yi given xk, wl

From 6.034 notes lozano-perez

backpropagation observations
Backpropagation Observations
  • Procedure is (relatively) efficient
    • All computations are local
      • Use inputs and outputs of current node
  • What is “good enough”?
    • Rarely reach target (0 or 1) outputs
      • Typically, train until within 0.1 of target
neural net summary
Neural Net Summary
  • Training:
    • Backpropagation procedure
      • Gradient descent strategy (usual problems)
  • Prediction:
    • Compute outputs based on input vector & weights
  • Pros: Very general, Fast prediction
  • Cons: Training can be VERY slow (1000’s of epochs), Overfitting
training strategies
Training Strategies
  • Online training:
    • Update weights after each sample
  • Offline (batch training):
    • Compute error over all samples
      • Then update weights
  • Online training “noisy”
    • Sensitive to individual instances
    • However, may escape local minima
training strategy
Training Strategy
  • To avoid overfitting:
    • Split data into: training, validation, & test
      • Also, avoid excess weights (less than # samples)
  • Initialize with small random weights
    • Small changes have noticeable effect
  • Use offline training
    • Until validation set minimum
  • Evaluate on test set
    • No more weight changes
classification
Classification
  • Neural networks best for classification task
    • Single output -> Binary classifier
    • Multiple outputs -> Multiway classification
      • Applied successfully to learning pronunciation
    • Sigmoid pushes to binary classification
      • Not good for regression
neural net conclusions
Neural Net Conclusions
  • Simulation based on neurons in brain
  • Perceptrons (single neuron)
    • Guaranteed to find linear discriminant
      • IF one exists -> problem XOR
  • Neural nets (Multi-layer perceptrons)
    • Very general
    • Backpropagation training procedure
      • Gradient descent - local min, overfitting issues
ad