1 / 30

Machine Learning Seminar: Support Vector Regression

Machine Learning Seminar: Support Vector Regression. Presented by: Heng Ji 10/08/03. Outline. Regression Background Linear ε - Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel Formulation Quadratic ε - Insensitive Loss Algorithm

tasha-woods
Download Presentation

Machine Learning Seminar: Support Vector Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Seminar:Support Vector Regression Presented by: Heng Ji 10/08/03

  2. Outline • Regression Background • Linear ε- Insensitive Loss Algorithm • Primal Formulation • Dual Formulation • Kernel Formulation • Quadratic ε- Insensitive Loss Algorithm • Kernel Ridge Regression & Gaussian Process

  3. Regression = find a function that fits the observations Observations: (1949,100) (1950,117) ... (1996,1462) (1997,1469) (1998,1467) (1999,1474) (x,y) pairs

  4. Linear fit... Not so good...

  5. Better linear fit... Take logarithm of y and fit a straight line

  6. Transform back to original So so...

  7. So what is regression about? Construct a model of a process, using examples of the process. Input: x (possibly a vector) Output: f(x) (generated by the process) Examples: Pairs of input and output {y, x} Our model: The function is our estimate of the true function g(x)

  8. Assumption about the process The “fixed regressor model” x(n) Observed input y(n) Observed output g[x(n)] True underlying function e(n) I.I.Dnoise process with zero mean Data set:

  9. Example 0<=e<=2

  10. Model Sets (examples) g(x) = 0.5 + x + x2 + 6x3 F1 F2 F3 F1 F2 F3 F1={a+bx}; F2={a+bx+cx2}; F3={a+bx+cx2+dx3}; Linear; Quadratic; Cubic;

  11. Idealized regression Find appropriate model familyFand findf(x)  Fwith minimum “distance” tog(x)(“error”) g(x) Error F fopt(x)  F Model Set (our hypothesis set)

  12. How measure “distance”? • Q: What is the distance (difference) between functions f and g?

  13. Margin Slack Variable For Example(xi, yi), function f, Margin slack variable θ: target accuracy in test γ: difference between target accuracy and margin in training

  14. ε- Insensitive LossFunction • Let ε= θ-γ, Margin Slack Variable • Linear ε- Insensitive Loss: • Quadratic ε- Insensitive Loss

  15. Linear ε- Insensitive Loss a Linear SV Machine ξ ξ Yi-<w,xi>

  16. Basic Idea of SV Regression • Starting point We have input data X = {(x1,y1), …., (xN,yN)} • Goal We want to find a robust function f(x) that has at most ε deviation from the targets y, while at the same time being as flat as possible. • Idea Simple Regression Problem + Optimization + Kernel Trick

  17. Thus setting: • Primal Regression Problem

  18. Linear ε- Insensitive Loss Regression min subject to ε decide Insensitive Zone C  a trade-off between error and ||w|| • εand C must be tuned simultaneously Regression is more difficult than Classification?

  19. Parameters used in SV Regression

  20. Dual Formulation • Lagrangian function will help us to formulate the dual problem • ε: insensitive loss βi* : Lagrange Multiplier ξi : difference value for points above εband ξi*: difference value for points below εband • Optimality Conditions

  21. Dual Formulation(Cont’) • Dual Problem • Solving

  22. +e -e KKT Optimality Conditions and b • KKT Optimality Conditions • b can be computed as follows This means that the Lagrange multipliers will only be non-zero for points outside the e band. Thus these points are the support vectors

  23. The Idea of SVM • input space feature space •     

  24. Kernel Version • Why can we use Kernel? The complexity of a function’s representation depends only on the number of SVs  the complete algorithm can be described in terms of inner product. An implicit mapping to the feature space • Mapping via Kernel

  25. Quadratic ε- Insensitive Loss Regression Problem: min subject to Kernel Formulation

  26. Kernel Ridge Regression & Gaussian Processes • ε= 0  Least Square Linear Regression The weight decay factor is controlled by C • min (λ~1/C) subject to • Kernel Formulation (I: Identity Matrix) is also the mean of a Gaussian distribution

  27. Architecture of SV Regression Machine b similar to regression in a three-layered neural network!?

  28. Conclusion • SVM is a useful alternative to neural network • Two key concepts of SVM • optimization • kernel trick • Advantages of SV Regression • Represent solution by a small subset of training points • Ensure the existence of global minimum • Ensure the optimization of a reliable eneralization bound

  29. Discussion1: Influence of an insensitivity band on regression quality • 17 measured training data points are used. • Left: ε= 0.1  15 SV are chosen • Right: ε= 0.5  6 chosen SV produced a much better regression function

  30. Discussion2: ε- Insensitive Loss • Enables sparseness within SVs, but guarantees sparseness? • Robust (robust to small changes in data/ model) • Less sensitive to outliers

More Related