1 / 36

PEGASOS Primal Estimated sub-GrAdient Solver for SVM

PEGASOS Primal Estimated sub-GrAdient Solver for SVM. Ming TIAN 04-20-2012. Reference. [1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. Mathematical Programming, Series B, 127(1):3-30, 2011.

kiral
Download Presentation

PEGASOS Primal Estimated sub-GrAdient Solver for SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEGASOSPrimal Estimated sub-GrAdient Solver for SVM Ming TIAN 04-20-2012

  2. Reference [1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. Mathematical Programming, Series B, 127(1):3-30, 2011. [2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010). Multi-Class Pegasos on a Budget. ICML. [3] Crammer, K & Singer. Y. (2001). On the algorithmic implemen- tation of multiclass kernel-based vector machines. JMLR, 2, 262-292. [4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classifi- cation on a budget. NIPS, 16, 225-232.

  3. Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works

  4. Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works

  5. Review of SVM optimization Q1: Empirical loss Regularization term

  6. Review of SVM optimization

  7. Review of SVM optimization • Dual-based methods • Interior Point methods • Memory: m2, time: m3, log(log(1/)) • Decomposition methods • Memory: m, Time: super-linear in m • Online learning & Stochastic Gradient • Memory: O(1), Time: 1/2 (linear kernel) • Memory: 1/2, Time: 1/4 (non-linear kernel) • Typically, online learning algorithms do not converge to the optimal solution of SVM Better rates for finite dimensional instances (Murata, Bottou)

  8. Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works

  9. PEGASOS A_t = S Subgradient method |A_t| = 1 Stochastic gradient Subgradient Projection

  10. Run-Time of Pegasos • Choosing |At|=1 and a linear kernel over Rn  Run-time required for Pegasos to find  accurate solution with probability 1- • Run-time does not depend on #examples • Depends on “difficulty” of problem ( and )

  11. Formal Properties • Definition: w is  accurate if • Theorem 1: Pegasos finds  accurate solution w.p. 1- after at most iterations. • Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. 1-, at least one of them is  accurate after iterations

  12. Proof SketchA second look on the update step:

  13. Proof Sketch • Denote: • Logarithmic Regret for OCP • Take expectation: • f(wr)-f(w*) 0 Markov gives that w.p. 1- • Amplify the confidence

  14. Proof Sketch

  15. Proof Sketch A function f is called strongly convex if is a convex function.

  16. Proof Sketch

  17. Proof Sketch

  18. Experiments • 3 datasets (provided by Joachims) • Reuters CCAT (800K examples, 47k features) • Physics ArXiv (62k examples, 100k features) • Covertype (581k examples, 54 features) • 4 competing algorithms • SVM-light (Joachims) • SVM-Perf (Joachims’06) • Norma (Kivinen, Smola, Williamson ’02) • Zhang’04 (stochastic gradient descent)

  19. Training Time (in seconds)

  20. obj. value test error Compare to Norma (on Physics)

  21. Compare to Zhang (on Physics) Objective But, tuning the parameter is more expensive than learning …

  22. Effect of k=|At| when T is fixed Objective

  23. Effect of k=|At| when kT is fixed Objective

  24. bias term • Popular approach: increase dimension of xCons: “pay” for b in the regularization term • Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2 • Define:Cons: |At| need to be large • Search b in an outer loopCons: evaluating objective is 1/2

  25. Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works

  26. multi-class SVM (Crammer & Singer, 2001) multi-class model :

  27. multi-class SVM (Crammer & Singer, 2001) multi-class SVM objective function: where and the multi-class hinge-loss function is defined as: where

  28. multi-class Pegasos use the instantaneous objective function: multi-class Pegasos works by iteratively executing the two-step updates: Step 1: Where:

  29. multi-class Pegasos If loss is equal to zero then: Else: Step 2: project the weight wt+1 into the closed convex set:

  30. Budgeted Multi-Class Pegasos

  31. Budget Maintenance Strategies • Budget maintenance through removal • the optimal removal always selects the oldest SV • Budget maintenance through projection • projecting an SV onto all the remaining SVs and thus results in smaller weight degradation. • Budget maintenance throughMerging • merging two SVs to a newly created one • The total cost of finding the optimal merging for the n-th and m-th SV is O(1).

  32. Experiments

  33. Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works

  34. Further works • Distribution_aware Pegasos? • Online structural regularized SVM?

  35. Thanks! Q&A

More Related