1 / 21

Vapnik-Chervonenkis Dimension

Vapnik-Chervonenkis Dimension. Part I: Definition and Lower bound. PAC Learning model. There exists a distribution D over domain X Examples: <x, c(x)> use c for target function (rather than c t ) Goal: With high probability (1- d ) find h in H such that error(h,c ) < e

Download Presentation

Vapnik-Chervonenkis Dimension

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vapnik-Chervonenkis Dimension Part I: Definition and Lower bound

  2. PAC Learning model • There exists a distribution D over domain X • Examples: <x, c(x)> • use c for target function (rather than ct) • Goal: • With high probability (1-d) • find h in H such that • error(h,c ) < e • e arbitrarily small.

  3. VC: Motivation • Handle infinite classes. • VC-dim “replaces” finite class size. • Previous lecture (on PAC): • specific examples • rectangle. • interval. • Goal: develop a general methodology.

  4. Definitions: Projection • Given a concept c over X • associate it with a set (all positive examples) • Projection (sets) • For a concept class C and subset S • PC(S) = { c  S | c  C} • Projection (vectors) • For a concept class C and S = {x1, … , xm} • PC(S) = {<c(x1), … , cxm)> | c  C}

  5. Definition: VC-dim • Clearly |PC(S) |  2m • C shatters S if |PC(S) | =2m • VC dimension of a class C: • The size d of the largest set S that shatters C. • Can be infinite. • For a finite class C • VC-dim(C)  log |C|

  6. Example 1: Interval 1 0 C1={cz | z  [0,1] } cz(x) = 1  x  z

  7. Example 2: line C2={cw | w=(a,b,c) } cw(x,y) = 1  ax+by  c

  8. Example 3: Parallel Rectangle

  9. Example 4: Finite union of intervals

  10. Example 5 : Parity • n Boolean input variables • T  {1, …, n} • fT(x) = iT xi • Lower bound: n unit vectors • Upper bound • Number of concepts • Linear dependency

  11. Example 6: OR • n Boolean input variables • Pand N subsets {1, …, n} • fP,N(x) = ( iP xi)  ( iN  xi) • Lower bound: n unit vectors • Upper bound • Trivial 2n • Use ELIM (get n+1) • Show second vector removes 2 (get n)

  12. Example 7: Convex polygons

  13. Example 7: Convex polygons

  14. Example 8: Hyper-plane C8={cw,c | wd} cw,c(x) = 1  <w,x>  c • VC-dim(C8) = d+1 • Lower bound • unit vectors and zero vector • Upper bound!

  15. Radon Theorem • Definitions: • Convex set. • Convex hull: conv(S) • Theorem: • Let T be a set of d+2 points in Rd • There exists a subset S of T such that • conv(S)  conv(T \ S)  • Proof!

  16. Hyper-plane: Finishing the proof • Assume d+2 points T can be shattered. • Use Radon Theorem to find S such that • conv(S)  conv(T \ S)  • Assign point in S label 1 • points not in S label 0 • There is a separating hyper-plane • How will it label conv(S)  conv(T \ S)

  17. Lower bounds: Setting • Static learning algorithm: • asks for a sample S of size m(e,d) • Based on S selects a hypothesis

  18. Lower bounds: Setting • Theorem: • if VC-dim(C) = then C is not learnable. • Proof: • Let m = m(0.1,0.1) • Find 2m points which are shattered (set T) • Let D be the uniform distribution on T • Set ct(xi)=1 with probability ½. • Expected error ¼. • Finish proof!

  19. Lower Bound: Feasible • Theorem • VC-dim(C)=d+1, then m(e,d)=W(d/e) • Proof: • Let T be a set of d+1 points which is shattered. • D samples: • z0 with prob. 1-8e • zi with prob. 8e/d

  20. Continue • Set ct(z0)=1 and ct(zi)=1 with probability ½ • Expected error 2e • Bound confidence • for accuracy e

  21. Lower Bound: Non-Feasible • Theorem • For two hypoth. m(e,d)=W((log 1/d)/e2) • Proof: • Let H={h0, h1}, where hb(x)=b • Two distributions: • D0: Prob. <x,1> is ½ - g and <y,0> is ½ + g • D1: Prob. <x,1> is ½ + g and <y,0> is ½ - g

More Related