Using analytic qp and sparseness to speed training of support vector machines
Download
1 / 23

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines. John C. Platt Presented by: Travis Desell. Overview. Introduction Motivation General SVMs General SVM training Related Work Sequential Minimal Optimization (SMO) Choosing the smallest optimization problem

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using Analytic QP and Sparseness to Speed Training of Support Vector Machines' - nyssa-newman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using analytic qp and sparseness to speed training of support vector machines

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

John C. Platt

Presented by: Travis Desell


Overview
Overview Support Vector Machines

  • Introduction

    • Motivation

    • General SVMs

    • General SVM training

    • Related Work

  • Sequential Minimal Optimization (SMO)

    • Choosing the smallest optimization problem

    • Solving the smallest optimization problem

  • Benchmarks

  • Conclusion

  • Remarks & Future Work

  • References


Motivation
Motivation Support Vector Machines

  • Traditional SVM Training Algorithms

    • Require quadratic programming (QP) package

    • SVM training is slow, especially for large problems

  • Sequential Minimal Optimization (SMO)

    • Requires no QP package

    • Easy to implement

    • Often faster

    • Good scalability properties


General svms
General SVMs Support Vector Machines

u = SiaiyiK(xi,x) – b (1)

  • u : SVM output

  • a : weights to blend different kernels

  • y in {-1, +1} : desired output

  • b : threshold

  • xi : stored training example (vector)

  • x : input (vector)

  • K : kernel function to measure similarity of xi to xi


General svms 2
General SVMs (2) Support Vector Machines

  • For linear SVMs, K is linear, thus (1) can be expressed as the dot product of w and x minus the threshold:

    u = w * x – b (2)

    w = Siaiyixi (3)

  • Where w, x, and xi are vectors


General svm training
General SVM Training Support Vector Machines

  • Training an SVM is finding ai, expressed as minimizing a dual quadratic form:

    minaY(a) = mina ½ Si SjyiyjK(xi, xj)aiaj – Siai (4)

  • Subject to box constraints:

    0 <= ai <= C, for all I (5)

  • And the linear equality constraint:

    Siyiai = 0 (6)

  • The ai are Lagrange multipliers of a primal QP problem: there is a one-to-one correspondence between each ai and each training example xi


General svm training 2
General SVM Training (2) Support Vector Machines

  • SMO solves the QP expressed in (4-6)

  • Terminates when all of the Karush-Kuhn-Tucker (KKT) optimality conditions are fulfilled:

    ai = 0 -> yiui >= 1 (7)

    0 < ai < C -> yiui = 1 (8)

    ai = C -> yiui <= 1 (9)

  • Where ui is the SVM output for the ith training example


Related work
Related Work Support Vector Machines

  • “Chunking” [9]

    • Removing training examples with ai = 0 does not change solution.

    • Breaks down large QP problem into smaller sub-problems to identify non-zero ai.

    • The QP sub-problem consists of every non-zero ai from previous sub-problem combined with M worst examples that violate (7-9) for some M [1].

    • Last step solves the entire QP problem as all non-zero ai have been found.

    • Cannot handle large-scale training problems if standard QP techniques are used. Kaufman [3] describes QP algorithm to overcome this.


Related work 2
Related Work (2) Support Vector Machines

  • Decomposition [6]:

    • Breaks the large QP problem into smaller QP sub-problems.

    • Osuna et al. [6] suggest using fixed size matrix for every sub-problem – allows very large training sets.

    • Joachims [2] suggests adding and subtracting examples according to heuristics for rapid convergence.

    • Until SMO, requires numerical QP library, which can be costly or slow.


Sequential minimal optimization
Sequential Minimal Optimization Support Vector Machines

  • SMO decomposes the overall QP problem (4-6), into fixed size QP sub-problems.

  • Chooses the smallest optimization problem (SOP) at each step.

    • This optimization consists of two elements of a, because of the linear equality constraint.

  • SMO repeatedly chooses two elements of a to jointly optimize until the overall QP problem is solved.


Choosing the sop
Choosing the SOP Support Vector Machines

  • Heuristic based approach

  • Terminates when the entire training set obeys (7-9) within e (typically <= 10-3)

  • Repeatedly finds a1 and a2 and optimizes until termination


Finding a 1
Finding Support Vector Machinesa1

  • “First choice heuristic”

    • Searches through examples most likely to violate conditions (non-bound subset)

    • ai at the bounds likely to stay there, non-bound ai will move as others are optimized

  • “Shrinking Heuristic”

    • Finds examples which fulfill (7-9) more than the worst example failed

    • Ignores these examples until a final pass at the end to ensure all examples fulfill (7-9)


Finding a 2
Finding Support Vector Machinesa2

  • Chosen to maximize the size of the step taken during the joint optimization of a1 and a2

  • Each non-bound has a cached error value E for each non-bound example

  • If E1 is negative, chooses a2 with minimum E2

  • If E1 is positive, chooses a2 with maximum E2


Solving the sop
Solving the SOP Support Vector Machines

  • Computes minimum along the direction of the linear equality constant:

    a2new = y2(E1-E2)/(K(x1,x1)+K(x2,x2)–2K(x1, x2)) (10)

    Ei = ui-yi (11)

  • Clips a2new within [L,H]:

    L = max(0,a2+sa1-0.5(s+1)C) (12)

    H = min(C,a2+sa1-0.5(s-1)C) (13)

    s = y1y2 (14)

  • Calculates a1new:

    a1new = a1 + s(a2 – a2new,clipped) (15)


Benchmarks
Benchmarks Support Vector Machines

  • UCI Adult: SVM is given 14 attributes of a census and is asked to predict if household income is greater than $50k. 8 categorical attributes, 6 continues = 123 binary attributes.

  • Web: classify if a web page is in a category or not. 300 sparse binary keyword attributes.

  • MNIST: One classifier is trained. 784-dimensional, non-binary vectors stored as sparse vectors.


Description of benchmarks
Description of Benchmarks Support Vector Machines

  • Web and Adult are trained with linear and Gaussian SVMs.

  • Performed with and without sparse inputs, with and without kernel caching

  • PCG chunking always uses caching


Benchmarking smo
Benchmarking SMO Support Vector Machines


Conclusions
Conclusions Support Vector Machines

  • PCG chunking slower than SMO, SMO ignores examples whose Lagrange multipliers are at C.

  • Overhead of PCG chunking not involved with kernel (kernel optimizations do not greatly effect time).


Conclusions 2
Conclusions (2) Support Vector Machines

  • SVMlight solves 10 dimensional QP sub-problems.

  • Differences mostly due to kernel optimizations and numerical QP overhead.

  • SMO faster on linear problems due to linear SVM folding, but SVMlight can potentially use this as well.

  • SVMlight benefits from complex kernel cache while SVM does have a complex kernel cache and thus does not benefit from it at large problem sizes.


Remarks future work
Remarks & Future Work Support Vector Machines

  • Heuristic based approach to finding a1 and a2 to optimize:

    • Possible to determine optimal choice strategy to minimize the number of steps?

  • Proof that SMO always minimizes the QP problem?


References
References Support Vector Machines

  • [1] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1998.

  • [2] T. Joachims. Making large-scale SVM learning practical. In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184. MIT Press, 1998.


References 2
References (2) Support Vector Machines

  • [3] L. Kaufman. Solving the quadratic programming problem arising in support vector classification. In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 147–168. MIT Press, 1998.

  • [6] E. Osuna, R. Freund, and F. Girosi. Improved training algorithm for support vector machines. In Proc. IEEE Neural Networks in Signal Processing ’97, 1997.


References 3
References (3) Support Vector Machines

  • [9] V. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.


ad