1 / 26

Towards Scalable Support Vector Machines Using Squashing

Towards Scalable Support Vector Machines Using Squashing. Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California Advisor:Dr. Hsu. Reporter:Hung Ching-Wen. Outline. 1. Motivation 2. Objective

urit
Download Presentation

Towards Scalable Support Vector Machines Using Squashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Scalable Support Vector Machines Using Squashing • Author:Dmitry Pavlov, Darya Chudova, • Padhraic Smyth • Info. And Comp. Science • University of California • Advisor:Dr. Hsu. • Reporter:Hung Ching-Wen

  2. Outline • 1. Motivation • 2. Objective • 3. Introduction • 4. SVM • 5. Squashing for SVM • 6.EXPERIMENTS • 7. conclusion

  3. Motivation • SVM provide classification model with strong theoretical foundation and excellent empirical performance. • But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.

  4. Objective • This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.

  5. Introduction • The applicability of SVMs to large datasets is limited ,because the high computational cost. • Speed-up training algorithms: • Chunking,Osuna’s decomposition method SMO • They can accelerate the training, but cannot scale well with the size of the training data.

  6. Introduction • Reducing the computational cost : • Sampling • Boosting • Squashing(DuMouchel et. al.,Madigan et. al.) • 本文作者提出Squashing-SMO,以解決SVM的高計算成本問題

  7. SVM • Training data:D={(xi,yi):i=1,…,N} • xi is a vector, yi=+1,-1 • In linear SVM :The linear separating classify y=<w,x>+b • w is the normal vector • b is the intercept of the hyperplane

  8. SVM(non-separable)

  9. SVM(a prior on w)

  10. Squashing for SVM • (1).Select a probabilistic model • P((X,Y) ∣θ) • (2).Our objective is to find mle θML

  11. Squashing for SVM • (3). Training data:D={(xi,yi):i=1,…,N}can be grouped into Nc groups • (Xc,Yc)sq:The squashed data point placed at the cluster C • βc :the wieght

  12. Squashing for SVM • If take the prior of w is • P(w) ~exp(-∥w∥2)

  13. Squashing for SVM • (4).The optimization model for the squashed data:

  14. Squashing for SVM • Important design issues for the squashing algorithm: • (1).the choice of the number and location of the squashing points • (2).to sample the values of w from the prior p(w) • (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of training point, and repeat the selection procedure L times(L is length)

  15. EXPERIMENTS • experiment datasets: • Synthetic data • UCI machine learning • UCI KKD repositories

  16. EXPERIMENTS • Evalute: • Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO • Run:over 100 runs • Performance: • Misclassification rate ,learning time ,the memory

  17. EXPERIMENTS(Results on Synthetic data) • (Wf,bf):estimated by full-SMO • (Ws,bs): :estimated by squashed or sampled data

  18. EXPERIMENTS(Results on Synthetic data)

  19. EXPERIMENTS(Results on Synthetic data)

  20. EXPERIMENTS(Results on Benchmark data)

  21. EXPERIMENTS(Results on Benchmark data)

  22. EXPERIMENTS(Results on Benchmark data

  23. EXPERIMENTS(Results on Benchmark data)

  24. conclusion • 1.we describe how the use of squashing make the training of SVM applicable to large datasets. • 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory. • 3.srs-SMO has a higher misclassification rate. • 4.squash-SMO and boost-SMO can tune parameter in cross-validation ,it is impossible to full-SMO

  25. conclusion • 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems. • 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.

  26. opinion • It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets. • 我們可以根據資料性質來改變w的prior distribution, 例如指數分配,Log-normal,或用無母數方法去做

More Related