Cs621 artificial intelligence
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

CS621 : Artificial Intelligence PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

CS621 : Artificial Intelligence. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 22 Perceptron Training Algorithm and its Proof of Convergence. θ , ≤. θ , <. w 1. w 2. w n. w 1. w 2. w n. w 3. x 1. x 2. x 3. x n. x 1. x 2. x 3. x n. Perceptron Training Algorithm (PTA).

Download Presentation

CS621 : Artificial Intelligence

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs621 artificial intelligence

CS621 : Artificial Intelligence

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture 22

Perceptron Training Algorithm and its Proof of Convergence


Perceptron training algorithm pta

θ, ≤

θ, <

w1

w2

wn

w1

w2

wn

. . .

w3

. . .

x1

x2

x3

xn

x1

x2

x3

xn

Perceptron Training Algorithm (PTA)

Preprocessing:

  • The computation law is modified to

    y = 1 if ∑wixi > θ

    y = o if ∑wixi < θ

w3


Pta preprocessing cont

θ

θ

wn

w0=θ

w1

w2

w3

. . .

x0= -1

x1

x2

x3

xn

w1

w2

w3

wn

. . .

x2

x3

xn

PTA – preprocessing cont…

2. Absorb θ as a weight

3. Negate all the zero-class examples

x1


Example to demonstrate preprocessing

Example to demonstrate preprocessing

  • OR perceptron

    1-class<1,1> , <1,0> , <0,1>

    0-class<0,0>

    Augmented x vectors:-

    1-class<-1,1,1> , <-1,1,0> , <-1,0,1>

    0-class<-1,0,0>

    Negate 0-class:- <1,0,0>


Example to demonstrate preprocessing cont

Example to demonstrate preprocessing cont..

Now the vectors are

x0 x1 x2

X1 -1 0 1

X2 -1 1 0

X3 -1 1 1

X4 1 0 0


Perceptron training algorithm

Perceptron Training Algorithm

  • Start with a random value of w

    ex: <0,0,0…>

  • Test for wxi > 0

    If the test succeeds for i=1,2,…n

    then return w

    3. Modify w, wnext = wprev + xfail


Tracing pta on or example

Tracing PTA on OR-example

w=<0,0,0>wx1 fails

w=<-1,0,1>wx4 fails

w=<0,0,1>wx2 fails

w=<-1,1,1>wx1 fails

w=<0,1,2>wx4 fails

w=<1,1,2>wx2 fails

w=<0,2,2>wx4 fails

w=<1,2,2>success


Proof of convergence of pta

Proof of Convergence of PTA

  • Perceptron Training Algorithm (PTA)

  • Statement:

    Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.


Proof of convergence of pta1

Proof of Convergence of PTA

  • Suppose wn is the weight vector at the nth step of the algorithm.

  • At the beginning, the weight vector is w0

  • Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as

    wi+1 = wi + Xj

  • Since Xjs form a linearly separable function,

     w* s.t. w*Xj > 0 j


Proof of convergence of pta2

Proof of Convergence of PTA

  • Consider the expression

    G(wn) = wn . w*

    | wn|

    where wn = weight at nth iteration

  • G(wn) = |wn| . |w*| . cos 

    |wn|

    where  = angle between wn and w*

  • G(wn) = |w*| . cos 

  • G(wn) ≤ |w*| ( as -1 ≤ cos  ≤ 1)


Behavior of numerator of g

Behavior of Numerator of G

wn . w* = (wn-1 + Xn-1fail ) . w*

  • wn-1 . w* + Xn-1fail . w*

  • (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* …..

  • w0 . w*+ ( X0fail + X1fail +.... + Xn-1fail ). w*

    w*.Xifail is always positive: note carefully

  • Suppose |Xj| ≥  , where  is the minimum magnitude.

  • Num of G ≥ |w0 . w*| + n  . |w*|

  • So, numerator of G grows with n.


Behavior of denominator of g

Behavior of Denominator of G

  • |wn| =  wn . wn

  •  (wn-1 + Xn-1fail )2

  •  (wn-1)2+ 2. wn-1. Xn-1fail + (Xn-1fail )2

  •  (wn-1)2+ (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 )

  •  (w0)2+ (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2

  • |Xj| ≤  (max magnitude)

  • So, Denom ≤ (w0)2+ n2


Some observations

Some Observations

  • Numerator of G grows as n

  • Denominator of G grows as  n

    => Numerator grows faster than denominator

  • If PTA does not terminate, G(wn) values will become unbounded.


Some observations contd

Some Observations contd.

  • But, as |G(wn)| ≤ |w*| which is finite, this is impossible!

  • Hence, PTA has to converge.

  • Proof is due to Marvin Minsky.


Convergence of pta

Convergence of PTA

  • Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.


Assignment

Assignment

  • Implement the perceptron training algorithm

  • Run it on the 16 Boolean Functions of 2 inputs

  • Observe the behaviour

    • Take different initial values of the parameters

    • Note the number of iterations before convergence

    • Plot graphs for the functions which converge


  • Login