Learning with neural networks l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Learning with Neural Networks PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Learning with Neural Networks. Artificial Intelligence CMSC 25000 February 19, 2002. Agenda. Neural Networks: Biological analogy Review: single-layer perceptrons Perceptron: Pros & Cons Neural Networks: Multilayer perceptrons Neural net training: Backpropagation Strengths & Limitations

Download Presentation

Learning with Neural Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning with neural networks l.jpg

Learning with Neural Networks

Artificial Intelligence

CMSC 25000

February 19, 2002


Agenda l.jpg

Agenda

  • Neural Networks:

    • Biological analogy

  • Review: single-layer perceptrons

    • Perceptron: Pros & Cons

  • Neural Networks: Multilayer perceptrons

    • Neural net training: Backpropagation

    • Strengths & Limitations

  • Conclusions


  • Neurons the concept l.jpg

    Neurons: The Concept

    Dendrites

    Axon

    Nucleus

    Cell Body

    Neurons: Receive inputs from other neurons (via synapses)

    When input exceeds threshold, “fires”

    Sends output along axon to other neurons

    Brain: 10^11 neurons, 10^16 synapses


    Perceptron structure l.jpg

    Perceptron Structure

    Single neuron-like element

    -Binary inputs &output

    -Weighted sum of inputs > threshold

    y

    w0

    wn

    w1

    w3

    w2

    x0=-1

    x1

    x2

    x3

    xn

    . . .

    • Until perceptron correct output for all

      • If the perceptron is correct, do nothing

      • If the percepton is wrong,

      • If it incorrectly says “yes”,

      • Subtract input vector from weight vector

      • Otherwise, add input vector to it

    compensates for threshold

    x0 w0


    Perceptron learning l.jpg

    x2

    0

    0

    0

    0

    + +++ + +

    0

    0

    0

    x1

    Perceptron Learning

    • Perceptrons learn linear decision boundaries

    • E.g.

    • Guaranteed to converge, if linearly separable

    • Many simple functions NOT learnable

    x2

    +

    0

    But not

    0

    +

    x1

    xor


    Neural nets l.jpg

    Neural Nets

    • Multi-layer perceptrons

      • Inputs: real-valued

      • Intermediate “hidden” nodes

      • Output(s): one (or more) discrete-valued

    X1

    Y1

    Y2

    X2

    X3

    X4

    Inputs

    Hidden

    Hidden

    Outputs


    Neural nets7 l.jpg

    Neural Nets

    • Pro: More general than perceptrons

      • Not restricted to linear discriminants

      • Multiple outputs: one classification each

    • Con: No simple, guaranteed training procedure

      • Use greedy, hill-climbing procedure to train

      • “Gradient descent”, “Backpropagation”


    Solving the xor problem l.jpg

    Solving the XOR Problem

    o1

    w11

    Network

    Topology:

    2 hidden nodes

    1 output

    w13

    x1

    w01

    w21

    y

    -1

    w23

    w12

    w03

    w22

    x2

    -1

    w02

    o2

    Desired behavior:

    x1 x2 o1 o2 y

    0 0 0 0 0

    1 0 0 1 1

    0 1 0 1 1

    1 1 1 1 0

    -1

    Weights:

    w11= w12=1

    w21=w22 = 1

    w01=3/2; w02=1/2; w03=1/2

    w13=-1; w23=1


    Backpropagation l.jpg

    Backpropagation

    • Greedy, Hill-climbing procedure

      • Weights are parameters to change

      • Original hill-climb changes one parameter/step

        • Slow

      • If smooth function, change all parameters/step

        • Gradient descent

          • Backpropagation: Computes current output, works backward to correct error


    Producing a smooth function l.jpg

    Producing a Smooth Function

    • Key problem:

      • Pure step threshold is discontinuous

        • Not differentiable

    • Solution:

      • Sigmoid (squashed ‘s’ function): Logistic fn


    Neural net training l.jpg

    Neural Net Training

    • Goal:

      • Determine how to change weights to get correct output

        • Large change in weight to produce large reduction in error

    • Approach:

      • Compute actual output: o

      • Compare to desired output: d

      • Determine effect of each weight w on error = d-o

      • Adjust weights


    Neural net example l.jpg

    z1

    z2

    z3

    y3

    z3

    w03

    -1

    w23

    w13

    y1

    y2

    z2

    z1

    w21

    w01

    w22

    w02

    w11

    -1

    w12

    -1

    x2

    x1

    Neural Net Example

    xi : ith sample input vector

    w : weight vector

    yi*: desired output for ith sample

    -

    Sum of squares error over training samples

    From 6.034 notes lozano-perez

    Full expression of output in terms of input and weights


    Gradient descent l.jpg

    Gradient Descent

    • Error: Sum of squares error of inputs with current weights

    • Compute rate of change of error wrt each weight

      • Which weights have greatest effect on error?

      • Effectively, partial derivatives of error wrt weights

        • In turn, depend on other weights => chain rule


    Gradient descent14 l.jpg

    E = G(w)

    Error as function of weights

    Find rate of change of error

    Follow steepest rate of change

    Change weights s.t. error is minimized

    Gradient Descent

    dG

    dw

    E

    G(w)

    w0w1

    w

    Local

    minima


    Gradient of error l.jpg

    z1

    z2

    z3

    y3

    z3

    w03

    -1

    w23

    w13

    y1

    y2

    z2

    z1

    w21

    w01

    w22

    w02

    w11

    -1

    w12

    -1

    x2

    x1

    Gradient of Error

    -

    Note: Derivative of sigmoid:

    ds(z1) = s(z1)(1-s(z1))

    dz1

    From 6.034 notes lozano-perez

    MIT AI lecture notes, Lozano-Perez 2000


    From effect to update l.jpg

    From Effect to Update

    • Gradient computation:

      • How each weight contributes to performance

    • To train:

      • Need to determine how to CHANGE weight based on contribution to performance

      • Need to determine how MUCH change to make per iteration

        • Rate parameter ‘r’

          • Large enough to learn quickly

          • Small enough reach but not overshoot target values


    Backpropagation procedure l.jpg

    Backpropagation Procedure

    i

    j

    k

    • Pick rate parameter ‘r’

    • Until performance is good enough,

      • Do forward computation to calculate output

      • Compute Beta in output node with

      • Compute Beta in all other nodes with

      • Compute change for all weights with


    Backprop example l.jpg

    y3

    z3

    w03

    -1

    w13

    y1

    w23

    y2

    z2

    z1

    w21

    w01

    w22

    w02

    -1

    w11

    w12

    -1

    x2

    x1

    Backprop Example

    Forward prop: Compute zi and yi given xk, wl

    From 6.034 notes lozano-perez


    Backpropagation observations l.jpg

    Backpropagation Observations

    • Procedure is (relatively) efficient

      • All computations are local

        • Use inputs and outputs of current node

    • What is “good enough”?

      • Rarely reach target (0 or 1) outputs

        • Typically, train until within 0.1 of target


    Neural net summary l.jpg

    Neural Net Summary

    • Training:

      • Backpropagation procedure

        • Gradient descent strategy (usual problems)

    • Prediction:

      • Compute outputs based on input vector & weights

    • Pros: Very general, Fast prediction

    • Cons: Training can be VERY slow (1000’s of epochs), Overfitting


    Training strategies l.jpg

    Training Strategies

    • Online training:

      • Update weights after each sample

    • Offline (batch training):

      • Compute error over all samples

        • Then update weights

    • Online training “noisy”

      • Sensitive to individual instances

      • However, may escape local minima


    Training strategy l.jpg

    Training Strategy

    • To avoid overfitting:

      • Split data into: training, validation, & test

        • Also, avoid excess weights (less than # samples)

    • Initialize with small random weights

      • Small changes have noticeable effect

    • Use offline training

      • Until validation set minimum

    • Evaluate on test set

      • No more weight changes


    Classification l.jpg

    Classification

    • Neural networks best for classification task

      • Single output -> Binary classifier

      • Multiple outputs -> Multiway classification

        • Applied successfully to learning pronunciation

      • Sigmoid pushes to binary classification

        • Not good for regression


    Neural net conclusions l.jpg

    Neural Net Conclusions

    • Simulation based on neurons in brain

    • Perceptrons (single neuron)

      • Guaranteed to find linear discriminant

        • IF one exists -> problem XOR

    • Neural nets (Multi-layer perceptrons)

      • Very general

      • Backpropagation training procedure

        • Gradient descent - local min, overfitting issues


  • Login