1 / 74

Support Vector Machines

Support Vector Machines. 主講人:虞台文. Content. Introduction The VC Dimension & Structure Risk Minimization Linear SVM  The Separable case Linear SVM  The Non-Separable case Lagrange Multipliers. Support Vector Machines. Introduction. Learning Machines. A machine to learn the mapping

enan
Download Presentation

Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines 主講人:虞台文

  2. Content • Introduction • The VC Dimension & Structure Risk Minimization • Linear SVM  The Separable case • Linear SVM  The Non-Separable case • Lagrange Multipliers

  3. Support Vector Machines Introduction

  4. Learning Machines • A machine to learn the mapping • Defined as Learning by adjusting this parameter?

  5. Generalization vs. Learning • How a machine learns? • Adjusting the parameters so as to partition the pattern (feature) space for classification. • How to adjust? Minimize the empirical risk (traditional approaches). • What the machine learned? • Memorize the patterns it sees? or • Memorize the rules it finds for different classes? • What does the machine actually learn if it minimizes empirical risk only?

  6. Risks Expected Risk (test error) Empirical Risk (training error)

  7. More on Empirical Risk • How can make the empirical risk arbitrarily small? • To let the machine have very large memorization capacity. • Does a machine with small empirical risk also get small expected risk? • How to avoid the machine to strain to memorize training patterns, instead of doing generalization, only? • How to deal with the straining-memorization capacity of a machine? • What the new criterion should be?

  8. Structure Risk Minimization Learn both the right ‘structure’ and right `rules’ for classification. Goal: Right Structure: E.g., Right amount and right forms of components or parameters are to participate in a learning machine. Right Rules: The empirical risk will also be reduced if right rules are learned.

  9. New Criterion Risk due to the structure of the learning machine Total Risk Empirical Risk = +

  10. Support Vector Machines The VC Dimension & Structure Risk Minimization

  11. VC: Vapnik Chervonenkis The VC Dimension • Consider a set of function f (x,) {1,1}. • A given set oflpoints can be labeled in 2lways. • If a member of the set {f ()} can be found which correctly assigns the labels for all labeling, then the set of points is shattered by that set of functions. • The VC dimension of {f ()} is the maximum number of training points that can be shattered by {f ()}.

  12. VC dimension = 3 The VC Dimension for Oriented Lines in R2

  13. More on VC Dimension • In general, the VC dimension of a set of oriented hyperplanes inRn is n+1. • VC dimension is a measure of memorization capability. • VC dimension is not directly related to number of parameters. Vapnik (1995) has an example with 1 parameter and infinite VC dimension.

  14. Bound on Expected Risk Expected Risk Empirical Risk VC Confidence

  15. Bound on Expected Risk Consider small  (e.g.,   0.5). VC Confidence

  16. Bound on Expected Risk Consider small  (e.g.,   0.5). Structure risk minimization want to minimize the bound Traditional approaches minimize empirical risk only

  17. VC Confidence Amongst machines with zero empirical risk, choose the one with smallest VC dimension How to evaluate VC dimension?  =0.05 and l=10,000

  18. h3 h2 h1 h4 Structure Risk Minimization Nested subset of functions with different VC dimensions.

  19. Support Vector Machines The Linear SVM  The Separable Case

  20. The Linear Separability Linearly separable Not linearly separable

  21. w wx + b = +1 wx + b = 1 The Linear Separability Linearly Separable wx + b = 0 Linearly separable

  22. w d wx + b = +1 wx + b = 1 O Margin Width How about maximize the margin? wx + b = 0 What is the relation btw. the margin width and VC dimension?

  23. O Supporters Maximum Margin Classifier How about maximize the margin? What is the relation btw. the margin width and VC dimension?

  24. Building SVM Minimize Subject to This requires the knowledge about Lagrange Multiplier.

  25. The Method of Lagrange Minimize Subject to The Lagrangian: Minimize it w.r.t w, while maximize it w.r.t. .

  26. The Method of Lagrange How about if it is zero? Minimize Subject to What value of i should be if it is feasible and nonzero? The Lagrangian: Minimize it w.r.t w, while maximize it w.r.t. .

  27. The Method of Lagrange Minimize Subject to The Lagrangian:

  28. Duality Minimize Subject to Maximize Subject to

  29. Duality Maximize Subject to

  30. Maximize Duality

  31. Maximize Duality

  32. Duality Minimize The Primal Subject to Maximize The Dual Subject to

  33. Quadratic Programming The Solution Find * by … Maximize The Dual Subject to

  34. The Solution Call it a support vector is i> 0. Find * by … The Karush-Kuhn-Tucker Conditions The Lagrangian:

  35. The Karush-Kuhn-Tucker Conditions

  36. Classification

  37. Classification Using Supporters The weight for the ith support vector. Bias The similarity measure btw. input and theithsupport vector.

  38. Demonstration

  39. Support Vector Machines The Linear SVM  The Non-Separable Case

  40. wx + b = +1 wx + b = +1  i wx + b = 1 wx + b = 0 The Non-Separable Case We require that

  41. For simplicity, we consider k = 1. Mathematic Formulation Minimize Subject to

  42. The Lagrangian Minimize Subject to

  43. Duality Minimize Subject to Maximize Subject to

  44. Duality Maximize Subject to

  45. Maximize this Duality

  46. Maximize this Duality

  47. Duality Minimize The Primal Subject to Maximize The Dual Subject to

  48. The Karush-Kuhn-Tucker Conditions

  49. Quadratic Programming The Solution Find * by … Maximize The Dual Subject to

  50. The Solution Call it a support vector is 0 <i< C. Find * by … The Lagrangian:

More Related