1 / 50

Sparse Kernels Methods

Sparse Kernels Methods. Steve Gunn. Overview. Part I : Introduction to Kernel Methods Part II : Sparse Kernel Methods. Part I. Introduction to Kernel Methods. Classification. Consider 2 class problem. Optimal Separating Hyperplane. Optimal Separating Hyperplane. Separate the data,.

alaire
Download Presentation

Sparse Kernels Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sparse Kernels Methods Steve Gunn

  2. Overview • Part I : Introduction to Kernel Methods • Part II : Sparse Kernel Methods

  3. Part I • Introduction to • Kernel Methods

  4. Classification • Consider 2 class problem

  5. Optimal Separating Hyperplane

  6. Optimal Separating Hyperplane Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.

  7. Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional

  8. Finding the OSH Quadratic Programming Problem • Size is dependent upon training set size • Unique global minimum

  9. Support Vectors • Information contained in support vectors • Can throw away rest of training data • SVs have non zero Lagrange multipliers

  10. Generalised Separating Hyperplane

  11. Non Separable Case • Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.

  12. Finding the GSH Quadratic Programming Problem Size is dependent upon training set size Unique global minimum

  13. Non-Linear SVM • Map input space to high dimensional feature space Find OSH or GSH in Feature Space

  14. Kernel Functions • Hilbert Schmidt Theory is a symmetric function Mercer’s Conditions

  15. Polynomial Degree 2

  16. Acceptable Kernel Functions • Polynomial Radial Basis Functions Multi-Layer Perceptrons

  17. Iris Data Set

  18. Generalisation Estimation Error Approximation Error Model Size Regression

  19. Regression Approximate the data, with a hyperplane, using a loss function, e.g., and the SRM principle.

  20. Solution Introduce slack variables and minimise subject to the constraints

  21. Finding the Solution Quadratic Programming Problem • Size is dependent upon training set size • Unique global minimum where

  22. Part I : Summary • Unique Global Minimum • Addresses Curse of Dimensionality • Complexity dependent upon data set size • Information contained in Support Vectors

  23. Part II • Sparse Kernel Methods

  24. Cyclic Nature of Empirical Modelling Design Induce Interpret Validate

  25. Induction • SVMs have strong theory • Good empirical performance • Solution of the form, • Interpretation • Input Selection • Transparency

  26. Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition

  27. Sparse Kernel Regression Previously …. Now

  28. The Priors • “Different priors for different parameters” • Smoothness – controls “overfitting” • Sparseness – enables input selection and controls overfitting

  29. Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse

  30. Choosing the Sub-Kernels • Avoid additional parameters if possible • Sub-models should be flexible

  31. Spline Kernel

  32. Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, And the multivariate ANOVA kernel is given by E.g. for a two input problem the ANOVA kernel is given by

  33. Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,

  34. Optimisation

  35. Quadratic Loss

  36. Epsilon-Insensitive Loss

  37. Data ANOVA Basis Selection Sparse ANOVA Selection Parameter Selection Model Algorithm 3+ Stage Technique Auto-selection of Parameters Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) • Capacity Control Parameter • cross-validation • Sparseness Parameter • Validation error Stage I

  38. Sparse Basis Solution Quadratic Loss Function (Quadratic Program) e-Insensitive Loss Function (Linear Program)

  39. AMPG Problem • Predict automobile MPG (392 samples) • Inputs: • no. of cylinders, displacement • horsepower, weight • acceleration, year • Output: • MPG

  40. Horse Power 50 86 122 Horse Power 158 194 230 Network transparency through ANOVA representation.

  41. SUPANOVA AMPG Results (=2.5) Loss Function Estimated Generalisation Error Stage I Stage III Linear Model Training Testing Mean Variance Mean Variance Mean Variance Quadratic Quadratic 6.97 7.39 7.08 6.19 11.4 11.0 e e Insensitive Insensitive 0.48 0.04 0.49 0.03 1.80 0.11 e Insensitive 1.10 0.07 1.37 0.10 Quadratic e Insensitive 7.07 6.52 7.13 6.04 11.72 10.94 Quadratic

  42. AMPG Additive Terms

  43. Summary • SUPANOVA is a global approach • Strong Basis (Kernel Methods) • Can control loss function and sparseness • Can impose limit on maximum variate terms • Generalisation + Transparency

  44. Further Information • http://www.isis.ecs.soton.ac.uk/ • isystems/kernel/ • SVM Technical Report • MATLAB SVM Toolbox • Sparse Kernel Paper • These Slides

More Related