cs344 introduction to artificial intelligence associated lab cs386
Download
Skip this Video
Download Presentation
CS344: Introduction to Artificial Intelligence (associated lab: CS386)

Loading in 2 Seconds...

play fullscreen
1 / 22

CS344: Introduction to Artificial Intelligence (associated lab: CS386) - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

CS344: Introduction to Artificial Intelligence (associated lab: CS386). Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid neuron 28 th March, 2011. Feedforward Network. Limitations of perceptron. Non-linear separability is all pervading

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CS344: Introduction to Artificial Intelligence (associated lab: CS386)' - peggy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs344 introduction to artificial intelligence associated lab cs386

CS344: Introduction to Artificial Intelligence(associated lab: CS386)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture 31: Feedforward N/W; sigmoid neuron

28thMarch, 2011

limitations of perceptron
Limitations of perceptron
  • Non-linear separability is all pervading
  • Single perceptron does not have enough computing power
  • Eg: XOR cannot be computed by perceptron
solutions
Solutions
  • Tolerate error (Ex: pocket algorithm used by connectionist expert systems).
    • Try to get the best possible hyperplane using only perceptrons
  • Use higher dimension surfaces

Ex: Degree - 2 surfaces like parabola

  • Use layered network
pocket algorithm
Pocket Algorithm
  • Algorithm evolved in 1985 – essentially uses PTA
  • Basic Idea:
    • Always preserve the best weight obtained so far in the “pocket”
    • Change weights, if found better (i.e. changed weights result in reduced error).
xor using 2 layers
XOR using 2 layers
  • Non-LS function expressed as a linearly separable
  • function of individual linearly separable functions.
example xor
Example - XOR
  • = 0.5

w1=1

w2=1

 Calculation of XOR

x1x2

x1x2

Calculation of

x1x2

  • = 1

w1=-1

w2=1.5

x2

x1

example xor1
Example - XOR
  • = 0.5

w1=1

w2=1

x1x2

1

1

x1x2

1.5

-1

-1

1.5

x2

x1

some terminology
Some Terminology
  • A multilayer feedforward neural network has
    • Input layer
    • Output layer
    • Hidden layer (assists computation)

Output units and hidden units are called

computation units.

training of the mlp
Training of the MLP
  • Multilayer Perceptron (MLP)
  • Question:- How to find weights for the hidden layers when no target output is available?
  • Credit assignment problem – to be solved by “Gradient Descent”
slide12

Note: The whole structure shown in earlier slide is reducible to a single neuron with given behavior

Claim: A neuron with linear I-O behavior can’t compute X-OR.

Proof: Considering all possible cases:

[assuming 0.1 and 0.9 as the lower and upper thresholds]

For (0,0), Zero class:

For (0,1), One class:

slide13

For (1,0), One class:

For (1,1), Zero class:

These equations are inconsistent. Hence X-OR can’t be computed.

Observations:

  • A linear neuron can’t compute X-OR.
  • A multilayer FFN with linear neurons is collapsible to a single linear neuron, hence no a additional power due to hidden layer.
  • Non-linearity is essential for power.
training of the mlp1
Training of the MLP
  • Multilayer Perceptron (MLP)
  • Question:- How to find weights for the hidden layers when no target output is available?
  • Credit assignment problem – to be solved by “Gradient Descent”
gradient descent technique
Gradient Descent Technique
  • Let E be the error at the output layer
  • ti = target output; oi = observed output
  • i is the index going over n neurons in the outermost layer
  • j is the index going over the p patterns (1 to p)
  • Ex: XOR:– p=4 and n=1
weights in a ff nn
Weights in a FF NN

m

wmn

  • wmn is the weight of the connection from the nth neuron to the mth neuron
  • E vs surface is a complex surface in the space defined by the weights wij
  • gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error

n

sigmoid neurons
Sigmoid neurons
  • Gradient Descent needs a derivative computation

- not possible in perceptron due to the discontinuous step function used!

 Sigmoid neurons with easy-to-compute derivatives used!

  • Computing power comes from non-linearity of sigmoid function.
training algorithm
Training algorithm
  • Initialize weights to random values.
  • For input x = <xn,xn-1,…,x0>, modify weights as follows

Target output = t, Observed output = o

  • Iterate until E <  (threshold)
observations
Observations

Does the training technique support our intuition?

  • The larger the xi, larger is ∆wi
    • Error burden is borne by the weight values corresponding to large input values
ad