1 / 22

VC theory, Support vectors and Hedged prediction technology

VC theory, Support vectors and Hedged prediction technology. Overfitting in classification. Assume a family C of classifiers of points in feature space F. A family of classifiers is a map from C F to {0,1} (Negative and positive class).

gwyn
Download Presentation

VC theory, Support vectors and Hedged prediction technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VC theory, Support vectors and Hedged prediction technology

  2. Overfitting in classification • Assume a family C of classifiers of points infeature space F. A family of classifiers is a map from CF to {0,1} (Negative and positive class). • For each subset X of F and each c in C, c(X) defines a partitioning of X into two classes. • C shatters X if every partitioning of X is accomplished by some c in C • If every point set X of size d is shattered by C, then the VC dimension is at least d. • If a point set of d+1 elements cannot be shattered by C, then the VC-dimension is at most d.

  3. VC-dimension of hyperplanes • The set of points on the line shatters any two points, but not three • The set of lines in the plane shatters any three non-collinear points, but no four points. • Any d+2 points in E^d can be partitioned into two blocks whose convex hulls intersect. • VC-dimension of hyperplanes in E^d is thus d+1.

  4. Why VC-dimension? • Elegant and pedagogical, not very useful. • Bounds future error of classifier, PAC-learning. • Exchangeable distribution of (xi, yi). • For first N points, training error for c isobserved error rate for c. • Goodness of selecting from C a classifier with best performance on training set depends on VC-dimension h:

  5. Why VC-dimension?

  6. Classify with hyperplanes Frank Rosenblatt (1928 – 1971) Pioneering work in classifying byhyperplanes in high-dimensional spaces. Criticized by Minsky-Papert, sincereal classes are not normallylinearly separable. ANN research taken up again in1980:s, with non-linear mappingsto get improved separation.Predecessor to SVM/kernel methods

  7. Find parallel hyperplanes • Separate examples by wide margin hyperplanes (classifications). • Enclose examples between hyperplanes (regression). • If necessary, non-linearly map examples to high-dimensional space where they are better separated.

  8. Find parallel hyperplanes Classification Red true separatingplane. Blue: wide marginseparation in sample Classify by planebetween blue planes

  9. Find parallel hyperplanes Regression Red: true central plane. Blue: narrowest margin enclosing sample New xk : predict ykso (xk, yk) lies on mid-plane (dotted).

  10. From vector to scalar product

  11. Soft Margins

  12. Soft Margins Quadratic programming goes through also with soft margins. Specification of softness constant C is part of most packages. However, no prior rule for setting C is established, and experimentation is necessary for each application. Choice is between narrowing margin, allowing more outliers, and using a more liberal kernel (to be described).

  13. SVM packages • Inputs xi, yi, and KERNEL and SOFTNESS information • Only output is , non-zero coefficients indicatesupport vectors. • Hyperplane obtained by

  14. Kernel Trick

  15. Kernel Trick

  16. Kernel Trick Example: 2D space (x1,x2). Map to 5D space (c1*x1, c2*x2, c3*x1^2, c4*x1*x2, c5*x2^2). K(x,y)=(xy+1)^2 =2*x1*y1+2*x2*y2+x1^2*y1^2+x2^2*y2^2+2*x1*x2*y1*y2+1 =(x)(y), Where (x)= ((x1,x2)) = (√2x1, √2x2, x1^2, √2x1*x2, x2^2). Hyperplanes in R^5 are mapped back to conic sections in R^2!!

  17. Kernel Trick Gaussian Kernel: K(x,y) = exp(-||x-y||^2/2

More Related