1 / 26

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS344: Introduction to Artificial Intelligence (associated lab: CS386). Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Backpropagation ; need for multiple layers and non linearity 5 th April, 2011. Backpropagation algorithm. …. Output layer (m o/p neurons). j. w ji.

kelda
Download Presentation

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS344: Introduction to Artificial Intelligence(associated lab: CS386) Pushpak BhattacharyyaCSE Dept., IIT Bombay Lecture 34: Backpropagation; need for multiple layers and non linearity 5th April, 2011

  2. Backpropagation algorithm …. Output layer (m o/p neurons) j wji • Fully connected feed forward network • Pure FF network (no jumping of connections over layers) …. i Hidden layers …. …. Input layer (n i/p neurons)

  3. Gradient Descent Equations

  4. Backpropagation – for outermost layer

  5. Backpropagation for hidden layers …. Output layer (m o/p neurons) k …. j Hidden layers …. i …. Input layer (n i/p neurons) kis propagated backwards to find value of j

  6. Backpropagation – for hidden layers

  7. General Backpropagation Rule • General weight updating rule: • Where for outermost layer for hidden layers

  8. Observations on weight change rules Does the training technique support our intuition? • The larger the xi, larger is ∆wi • Error burden is borne by the weight values corresponding to large input values

  9. Observations contd. • ∆wi is proportional to the departure from target • Saturation behaviour when o is 0 or 1 • If o < t, ∆wi > 0 and if o > t, ∆wi < 0 which is consistent with the Hebb’s law

  10. Hebb’s law nj wji ni • If nj and ni are both in excitatory state (+1) • Then the change in weight must be such that it enhances the excitation • The change is proportional to both the levels of excitation ∆wjiα e(nj) e(ni) • If ni and nj are in a mutual state of inhibition ( one is +1 and the other is -1), • Then the change in weight is such that the inhibition is enhanced (change in weight is negative)

  11. Saturation behavior • The algorithm is iterative and incremental • If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region. • The weight values hardly change in the saturation region

  12. = 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x2 x1 How does it work? • Input propagation forward and error propagation backward (e.g. XOR)

  13. If Sigmoid Neurons Are Used, Do We Need MLP? Does sigmoid have the power of separating non-linearly separable data? Can sigmoid solve the X-OR problem

  14. 1 O yu O = 1 / 1+ e -net yl net O = 1 if O > yu O = 0 if O < yl Typically yl << 0.5 , yu >> 0.5

  15. W0 W2 W1 X0=-1 X2 X1 O = 1 / (1+ e –net ) Inequalities

  16. <0, 0> O = 0 i.e 0 < yl 1 / 1 + e(–w1x1- w2x2+w0)< yl i.e. (1 / (1+ ewo))< yl (1)

  17. <0, 1> O = 1 i.e. 0 > yu 1/(1+ e (–w1x1- w2x2 + w0))> yu (1 / (1+ e-w2+w0))> yu (2)

  18. <1, 0> O = 1 i.e. (1/1+ e-w1+w0)> yu (3) <1, 1> O = 0 i.e. 1/(1+ e-w1-w2+w0)< yl (4)

  19. 1/(1+ ewo) < yl Rearranging, 1 gives i.e. 1+ ewo > 1 / yl i.e. Wo > ln ((1- yl) / yl) (5)

  20. 1/1+ e-w2+w0 > yu 2 Gives i.e. 1+ e-w2+w0 < 1 / yu i.e. e-w2+w0 < 1-yu / yu i.e. -W2 + Wo < ln (1-yu) / yu (6) i.e. W2 - Wo > ln (yu / (1–yu))

  21. 3 Gives (7) W1 - Wo > ln (yu / (1- yu)) 4 Gives -W1 – W2 + Wo > ln ((1- yl)/ yl) (8)

  22. 5 + 6 + 7 + 8 Gives 0 > 2ln (1- yl )/ yl + 2 ln yu / (1 – yu ) i.e. 0 > ln [ (1- yl )/ yl * yu / (1 – yu )] i.e. ((1- yl ) / yl) * (yu / (1 – yu )) < 1

  23. [(1- yl ) / (1- yy )] * [yu / yl] < 1 • 2) Yu >> 0.5 • 3) Yl <<0.5 From i, ii and iii; Contradiction, hence sigmoid cannot compute X-OR

  24. Can Linear Neurons Work? h2 h1 x2 x1

  25. Note: The whole structure shown in earlier slide is reducible to a single neuron with given behavior Claim: A neuron with linear I-O behavior can’t compute X-OR. Proof: Considering all possible cases: [assuming 0.1 and 0.9 as the lower and upper thresholds] For (0,0), Zero class: For (0,1), One class:

  26. For (1,0), One class: For (1,1), Zero class: These equations are inconsistent. Hence X-OR can’t be computed. Observations: • A linear neuron can’t compute X-OR. • A multilayer FFN with linear neurons is collapsible to a single linear neuron, hence no a additional power due to hidden layer. • Non-linearity is essential for power.

More Related