1 / 22

CS621/CS449 Artificial Intelligence Lecture Notes

CS621/CS449 Artificial Intelligence Lecture Notes. Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004. Outline. Proof of Convergence of PTA Some Observations Study of Linear Separability Test for Linear Separability Asummability. Proof of Convergence of PTA.

rolf
Download Presentation

CS621/CS449 Artificial Intelligence Lecture Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS621/CS449Artificial IntelligenceLecture Notes Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004 CS-621/CS-449 Lecture Notes

  2. Outline • Proof of Convergence of PTA • Some Observations • Study of Linear Separability • Test for Linear Separability • Asummability CS-621/CS-449 Lecture Notes

  3. Proof of Convergence of PTA • Perceptron Training Algorithm (PTA) • Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function. CS-621/CS-449 Lecture Notes

  4. Proof of Convergence of PTA • Suppose wn is the weight vector at the nth step of the algorithm. • At the beginning, the weight vector is w0 • Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as wi+1 = wi + Xj • Since Xjs form a linearly separable function,  w* s.t. w*Xj > 0 j CS-621/CS-449 Lecture Notes

  5. Proof of Convergence of PTA • Consider the expression G(wn) = wn . w* | wn| where wn = weight at nth iteration • G(wn) = |wn| . |w*| . cos  |wn| where  = angle b/w wn and w* • G(wn) = |w*| . cos  • G(wn) ≤ |w*| ( as -1 ≤ cos  ≤ 1) CS-621/CS-449 Lecture Notes

  6. Behavior of Numerator of G wn . w* = (wn-1 + Xn-1fail ) . w* • wn-1 . w* + Xn-1fail . w* • (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* ….. • w0 . w*+ ( X0fail + X1fail +.... + Xn-1fail ). w* • Suppose |Xj| ≥  , where  is the minimum magnitude. • Num of G ≥ |w0 . w*| + n  . |w*| • So, numerator of G grows with n. CS-621/CS-449 Lecture Notes

  7. Behavior of Denominator of G • |wn| =  wn . wn •  (wn-1 + Xn-1fail )2 •  (wn-1)2+ 2. wn-1. Xn-1fail + (Xn-1fail )2 •  (wn-1)2+ (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 ) •  (w0)2+ (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2 • |Xj| ≤  (max magnitude) • So, Denom ≤ (w0)2+ n2 CS-621/CS-449 Lecture Notes

  8. Some Observations • Numerator of G grows as n • Denominator of G grows as  n => Numerator grows faster than denominator • If PTA does not terminate, G(wn) values will become unbounded. • But, as |G(wn)| ≤ |w*| which is finite, this is impossible! • Hence, PTA has to converge. • Proof is due to Marvin Minsky. CS-621/CS-449 Lecture Notes

  9. θ w1 w2 w3 wn . . . x2 x3 xn Study of Linear Separability • W. Xj = 0 defines a hyperplane in the (n+1) dimension. => W vector and Xj vectors are perpendicular to each other. y x1 CS-621/CS-449 Lecture Notes

  10. Linear Separability X1+ + X2 + Xk+1 - Xk+ Positive set : w. Xj > 0 j≤k Xk+2 - - Xm - Negative set : w. Xj < 0 j>k Separating hyperplane CS-621/CS-449 Lecture Notes

  11. Linear Separability • w. Xj = 0 => w is normal to the hyperplane which separates the +ve points from the –ve points. • In this computing paradigm, computation means “placing hyperplanes”. • Functions computable by the perceptron are called • “threshold functions” because of comparing ∑ wiXi with θ (the threshold) • “linearly separable” because of setting up linear surfaces to separate +ve and –ve points CS-621/CS-449 Lecture Notes

  12. Linear Separability • Decision problems may have +ve and –ve points that need to be separated • The hyperplane found should be generalized through learning • PAC learning, SVMs etc. for learning CS-621/CS-449 Lecture Notes

  13. Test for Linear Separability (LS) • Theorem: A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC) • PLC – Both a necessary and sufficient condition. • X1, X2, … , Xm - Vectors of the function • Y1, Y2, … , Ym - Augmented negated set • Prepending -1 to the 0-class vector Xi and negating it, gives Yi CS-621/CS-449 Lecture Notes

  14. Example (1) - XNOR • The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m • where each Pi is a non-negative scalar and • atleast one Pi > 0 • Example : 2 bit even-parity (X-NOR function) CS-621/CS-449 Lecture Notes

  15. Example (1) - XNOR • P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T + P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T = [ 0 0 0 ]T • All Pi = 1 gives the result. • For Parity function, PLC exists => Not linearly separable. CS-621/CS-449 Lecture Notes

  16. Example (2) – Majority function • 3-bit majority function • Suppose PLC exists. Equations obtained are: • P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0 • -P5+ P6 + P7+ P8 = 0 • -P3+ P4 + P7+ P8 = 0 • -P2+ P4 + P6+ P8 = 0 • On solving, all Pi will be forced to 0 • 3 bit majority function => No PLC => LS CS-621/CS-449 Lecture Notes

  17. Example (3) – n bit comparator z • n-bit comparator • z = 1 if dec(y) > dec(x) = 0 otherwise • Perceptron realization possible? θ W2n-1 wn wn w0 . . . . . y2n-1 yn Xn-1 x0 • w vector is such that wi = - 2i i , 0 ≤ i ≤ n-1 = 2j-n j , n ≤ j ≤ 2n-1 • and θ = 0 CS-621/CS-449 Lecture Notes

  18. Example (3) – n bit comparator • 2 bit comparator (y0, y1, x0, x1, z) • +ve class : 0100, 1000 and so on • -ve class : 0000, 0001and so on • After augmentation and negation, 10 non-negative nos. Pi are to be found such that not all of them are zero. • Suppose PLC exists, Σ Pi Yi = 0such that i Pi ≥ 0, i=1…15and ~ (i , Pi = 0) • Solving the equations, we eventually end up with all Pis = 0. CS-621/CS-449 Lecture Notes

  19. Observations • If the vectors are from a linearly separable function, then during the course of training, never can there be a repetition of weights. • Weights strictly increase and hence do not repeat • Asummability CS-621/CS-449 Lecture Notes

  20. Asummability • Another form of PLC test • Helps in quick testing of LS • Definition of k-summability: Given the +ve and -ve classes, a function is called k-summable, if we can find k vectors from the +ve class and k vectors from the –ve class such that where pis and njs are >0. CS-621/CS-449 Lecture Notes

  21. Asummability • A function is called asummable if it is not k-summable for any k. • Asummability is equivalent to LS. • Example : XOR +ve class -ve class 01 10 11 00 11 00 11 01 10 11 2-summable => not LS CS-621/CS-449 Lecture Notes

  22. Assumability – Examples • AND – is not k-summable => asummability =>LS +ve class -ve class 11 00 01 10 • A single vector in one class and all others in another => linearly separable • Another example – Majority function CS-621/CS-449 Lecture Notes

More Related