1 / 17

Two Important Parts of SMO (selection heuristics & stopping criterion)

Two Important Parts of SMO (selection heuristics & stopping criterion). A good selection of a pair of updating points will speed up the convergence. The selection heuristics maybe depend on the stopping criterion. Stopping criterion: duality gap

dexter-rios
Download Presentation

Two Important Parts of SMO (selection heuristics & stopping criterion)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two Important Parts of SMO (selection heuristics & stopping criterion) • A good selection of a pair of updating points will speed up the convergence • The selection heuristics maybe depend on the stopping criterion • Stopping criterion: duality gap => naturally to choose the points with the most violation of the KKT conditions (Too expensive)

  2. How to Solve an Unconstrained MP • Get an initial point and iteratively decrease the obj. function value • Stop once the stopping criteria satisfied • Steep decent might not be a good choice • Newton’s method is highly recommended • Local and quadratic convergent algorithm • Need to choose a good step size to guarantee global convergence

  3. Steep Descent with Exact Line Search Having Start with any . , stop if Else compute as follows: (i) Steep descent direction: (ii) Exact line search: Choose a stepsize such that (iii) Updating:

  4. (ii) Updating: Newton’s Method Having Start with . , stop if Else compute as follows: (i) Newton direction: Have to solve a system of linear equations here! • Converge only when is close to enough.

  5. It can not converge to the optimal solution.

  6. min (QP) s. t. At the solution of(QP): . Hence (QP) is equivalent to the nonsmooth SVM: where min SVM as anUnconstrained Minimization Problem • Change (QP)into an unconstrained MP • Reduce (n+1+l) variables to (n+1) variables

  7. Smooth the Plus Function:Integrate Step function: Sigmoid function: p-function: Plus function:

  8. in the nonsmooth • Replacing the plus function , gives our SSVM: SVM by the smooth is an accurate smooth approximation Here, min of , obtained by integrating the sigmoid function of neural networks.(sigmoid = smoothed step) • The solution ofSSVMconverges to the solution of goes to infinity. nonsmooth SVM as (Typically, ) SSVM: Smooth Support Vector Machine

  9. Newton-Armijo Method: Quadratic Approximation of SSVM • The sequence generated by solving a quadratic approximation of SSVM, converges to the of SSVM at a quadratic rate. unique solution • Converges in 6 to 8 iterations • At each iteration we solve a linear system of: • n+1 equations in n+1 variables • Complexity depends ondimensionof input space • It might be needed to select a stepsize

  10. Newton-Armijo Algorithm Start with any . Having else : stop if (i) Newton Direction : globally and quadratically converge to unique solution in a finite number of steps (ii) Armijo Stepsize : such that Armijo’s rule is satisfied

  11. Having , stop if Start withany . . Else compute min as follows: by (i) Newton Direction: Determine direction solving n+1 linear equations inn+1 variables: (ii) Armijo Stepsize: Choose such that: where (iii) Updating: Newton-Armijo Algorithm for SSVM:

  12. SVM SVM Comparisons of SSVM with other SVMs Tenfold test set correctness % (bestinRed)CPU time in seconds SSVM QP LP Linear Eqns.

  13. Two-spiral Dataset(94 White Dots & 94 Red Dots)

  14. Given a linearly separable training set and Repeat: until no mistakes made within the for loop return: The Perceptron Algorithm(Dual Form)

  15. Linear SVM: (Linear separating surface: ) min (QP) s. t. . Maximizing the margin By QP“duality”, in the “dual space”gives: min s. t. • Dual SSVM with separator: min Nonlinear SVM Motivation

  16. : by a nonlinear kernel • Replace min • The kernel matrix is fully dense • Each iteration solves m+1 linear equations in m+1 variables Nonlinear Smooth SVM Nonlinear Classifier: • UseNewtonalgorithm to solve the problem • Nonlinear classifier depends on entire dataset :

  17. is fully dense • The nonlinear kernel entries • Need to generate and store • Computational complexity depends on • Complexity of nonlinear SSVM • Need to store the entire dataset even after solving the problem Difficulties with Nonlinear SVM for Large Problems • Long CPU time to compute the dense kernel matrix • Runs out of memory while storing the kernel matrix • Separating surface depends on almost entire dataset

More Related