1 / 52

Rakesh Babu 200402007 Advisors : Prof C. V. Jawahar, IIIT Hyderabad

Learning Non-Linear Kernel Combinations Subject to General Regularization: Theory and Applications. Rakesh Babu 200402007 Advisors : Prof C. V. Jawahar, IIIT Hyderabad Dr. Manik Varma, Microsoft Research. Overview. Introduction to Support Vector Machines (SVM)

langer
Download Presentation

Rakesh Babu 200402007 Advisors : Prof C. V. Jawahar, IIIT Hyderabad

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Non-Linear Kernel Combinations Subject to General Regularization: Theory and Applications Rakesh Babu 200402007 Advisors : Prof C. V. Jawahar, IIIT Hyderabad Dr. Manik Varma, Microsoft Research

  2. Overview • Introduction to Support Vector Machines (SVM) • Multiple Kernel Learning (MKL) • Problem Statement • Literature Survey • Generalized Multiple Kernel Learning ( GMKL ) • Applications • Conclusion

  3. SVM Notation Xi i = 1,..…..,M yi i = 1,……,M Margin = 2 /  > 1 Misclassified point  < 1 b Support Vector Support Vector  = 0 w wt(x) + b = -1 wt(x) + b = 0 wt(x) + b = +1

  4. SVM Formulation • Primal : Minimise ½wtw + C ii • Subject to • yi [wt(xi) + b] ≥ 1 – i • i ≥ 0 • Dual : Max i i + ij ijyiyj<xi , xj> • Subject to • iiyi = 0 • 0    C • w = i iyi xi

  5. SVM Formulation • Primal : Minimise ½wtw + C ii • Subject to • yi [wt(xi) + b] ≥ 1 – i • i ≥ 0 • Dual : Max i i + ij ijyiyj<xi , xj> • Subject to • iiyi = 0 • 0    C • f(x) = wt x + b = i iyi < xi, x > + b

  6. Kernel Trick • Using some function which maps input space to feature space. • We build the classifier in feature space.

  7. SVM Formulation • Primal : Minimise ½wtw + C ii • Subject to • yi [wt(xi) + b] ≥ 1 – i • i ≥ 0 • Dual : Max i i + ij ijyiyj<xi , xj> • Subject to • iiyi = 0 • 0    C

  8. SVM after Kernelization • Primal : Minimise ½wtw + C ii • Subject to • yi [wt (xi) + b] ≥ 1 – i • i ≥ 0 • Dual : Max i i + ij ijyiyj<(xi) , (xj) > • Subject to • iiyi = 0 • 0    C • f(x) = wtx + b = i iyi<(xi) , (x) >+ b Dot product in feature space

  9. SVM after Kernelization • Primal : Minimise ½wtw + C ii • Subject to • yi [wt (xi) + b] ≥ 1 – i • i ≥ 0 • Dual : Max i i + ij ijyiyjk(xi , xj) • Subject to • iiyi = 0 • 0    C • f(x) = wtx + b = i iyik(xi , x)+ b Kernel function

  10. Kernel function & Kernel Matrix Class 1 Class 2 • Dot products in feature space is computed efficiently using kernel function. • e.g. RBF = • Properties of Kernel Function • Positive definite kernel Class 1 Class 2 t(xi)(xj) k(xi,xj) Kij = k(xi,xj)

  11. Linear : k(xi,xj) = xitxj • Polynomial : k(xi,xj) = (xitxj + c)d • Gaussian (RBF) : k(xi,xj) = exp( –(xi – xj)2) • Chi-Squared : k(xi,xj) = exp( –2(xi, xj) ) Some Popular Kernels

  12. Varying Kernel Parameter g g g =0.001 =1 =1000 RBF RBF RBF 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Decision Boundaries 5

  13. Learning the kernel • Valid kernels : • k = α1k1 + α2k2 • k = k1 * k2 • Learning the kernel function k(xi,xj) = ldlkl(xi , xj)

  14. Multiple Kernel Learning • Learning the SVM parameters (’s ) and Kernel parameters (d’s) is multiple kernel learning problem. • k(xi,xj) = ldlkl(xi,xj) d22  = d33

  15. Problem Statement • Most of multiple kernel learning formulations are restricted to linear combination of kernels subject to either l1 or l2 regularization. • In this thesis, we address the problem of how the kernel can be learnt using non-linear kernel combinations subject to general regularization. • We investigate some applications, the use of non-linear kernel combinations.

  16. Literature Survey • Kernel Target Alignment • Semi-Definite Programming-MKL (SDP) • Block l1-MKL (M-Y regularization + SMO) • Semi-Infinite Linear Programming-MKL (SILP) • Simple MKL (gradient descent) • Hyper kernels (SDP/SOCP) • Multi-class MKL • Hierarchical MKL • Local MKL • Mixed norm MKL (mirror descent)

  17. Multiple Kernel Learning • MKL learns a linear combination of base kernels • k(xi,xj) = ldlkl(xi,xj) d11 d22  = d33

  18. Generalized MKL • GMKL learns non-linear kernel combinations • Product : k(xi,xj) = lkl(xi,xj) 1 = 2 =  =

  19. Toy Example : Non-linear Kernel Combination Individual 1D feature spaces 1 and 2 3 1 4 1 1 2 0 0 2 2 4 3 Combined kernel feature spaces 2 3 1 4 1 1 0 3 2 1 2 2 4 Sum Product

  20. Generalized MKL Primal • Formulation • Min ½wtw + i L(f(xi), yi) + r(d) • subject to the constraints on d • where • (xi, yi) is the ith training point. • f(x) = wtd(x) + b • L is a general loss function. • Kd is a kernel function parameterised by d. • r is a regulariser on the kernel parameters. • This formulation is not convex.

  21. GMKL Primal for Classification • MinimisedT(d) subject to d ≥ 0 • where • T(d) = Minw,b,½wtw + C ii +r(d) • Subject to • yi [wt(xi) + b] ≥ 1 – i • i ≥ 0 • To minimise T using gradient descent we need to • Prove that dT exists. • Calculate dT efficiently.

  22. Visualizing T on UCI Sonar Data

  23. Dual - Differentiability • W(d) = r(d) + Max1t – ½ tYKdY • Subject to • 1tY = 0 • 0 ≤  ≤ C • T(d) = W(d) by the principle of strong duality. • Differentiability with respect to d comes from Danskin's Theorem [Danskin 1947].

  24. Dual - Derivative • Let *(d) be the optimal value of  so that • W(d) = r(d) + 1t*– ½ *tYKdY* • W = r(d) – ½*tY KY* • Since d is fixed, W(d) is the standard SVM dual and * can be obtained using any SVM solver.

  25. Final Algorithm • Initialise d0 randomly • Repeat until convergence criteria is met • Form K using the current estimate of d. • Use any SVM solver to obtain *. • Update dn+1 = max(0, dn – nW)

  26. Applications • Feature selection • Learning discriminative parts/pixels for object categorization • Character Recognition taken in natural scenes

  27. MKL & its Applications • In general, applications exploit one of the following views of MKL. • To obtain the optimal weights of different features used for the task. • To interpret the sparsity after learning the weights of the kernels. • To Combine the multiple heterogeneous data sources.

  28. Applications : Feature Selection • UCI datasets k(xi,xj) = l exp( -dl(xil-xjl)2 )

  29. UCI Datasets – Ionosphere N = 246 M = 34 Uniform MKL = 89.9  2.5 Uniform GMKL = 93.6  2.0

  30. UCI Dataset – Parkinson’s N = 136 M = 22 Uniform MKL = 87.3  3.9 Uniform GMKL = 91.0  3.5

  31. UCI Datasets – Musk N = 333 M = 166 Uniform MKL = 90.2  3.2 Uniform GMKL = 93.8  1.9

  32. UCI Datasets – Sonar N = 145 M = 60 Uniform MKL = 82.9  3.4 Uniform GMKL = 84.6  4.1

  33. UCI Datasets – Wpbc N = 135 M = 34 Uniform MKL = 72.1  5.4 Uniform GMKL = 77.0  6.4

  34. Application:Learning Discriminative Pixels/Parts • Problem : Can Image Categorization can be done efficiently ? • Idea:Often, Information present in images can be redundant • Solution :Yes, by focusing on only a subset of pixels or regions in an image.

  35. Solution • A Kernel is associated with each part

  36. Pixel Selection for Gender Identification • Database of FERET faces [Moghaddam and Yang ,PAMI 2002]. Males Females

  37. Gender Identification – Features Pixel 1 Pixel 252

  38. Gender Identification - Results N = 1053 M = 252 Uniform MKL = 92.6  0.9 Uniform GMKL = 94.3  0.1

  39. Caltech 101 • Task : Object recognition • No. of classes : 102 classes • Problem !!! Not Perfectly Aligned !!! ......but roughly aligned Collected by Fei-Fei et al. [PAMI 2006]

  40. Approach Kernel 1 Feature Extraction : GIST Kernel 64

  41. Faces_Easy and Windsor_chair

  42. Car_Side and Leopards

  43. Minaret and Bikes

  44. Problem • Objective : Character recognition (English) taken from natural scenes.

  45. A sample approach for sentence recognition from images: bottom up Sentence • Locate characters in images • Recognise characters • Recognise words • Recognise sentences Word Word Character Character Character Character recognition Character detection Image

  46. Challenges • Perspective distortion • Occlusion • Variations in • Contrast • Color • Style • Sizes • Motion blur • Inter class distance is less and intra class distance is more. • Large number of classes • None of existing OCR techniques works here.

  47. class-based vector quantisation patch detection feature extraction histogram computation +0.1 -1.5 … -0.5 x = Character Recognition usingBag of Features (discrete distribution) classification

  48. Feature Extraction Methods • Geometric Blur [Berg] • Shape Contexts [Belongie et al] • SIFT [Lowe] • Patches [Varma & Zisserman 07] • SPIN [Lazebnik et al., Johnson] • MR8 (maximum response of 8 filters) [Varma & Zisserman 05]

  49. Results - SVM and MKL • MKL Class i Class j

  50. Conclusions • We presented a formulation which accepts non-linear kernel combinations. • GMKL results can be significantly better than standard MKL. • We shown several applications where proposed formulation gives better than state of art methods.

More Related