Create Presentation
Download Presentation

Download

Download Presentation

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

141 Views
Download Presentation

Download Presentation
## Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Pattern Analysis using Convex Optimization: Part 2 of**Chapter 7 Discussion Presenter: Brian Quanz**About today’s discussion…**• Last time: discussed convex opt. • Today: Will apply what we learned to 4 pattern analysis problems given in book: • (1) Smallest enclosing hypersphere (one-class SVM) • (2) SVM classification • (3) Support vector regression (SVR) • (4) On-line classification and regression**About today’s discussion…**• This time for the most part: • Describe problems • Derive solutions ourselves on the board! • Apply convex opt. knowledge to solve • Mostly board work today**Recall: KKT Conditions**• What we will use: • Key to remember ch. 7: • Complementary slackness -> sparse dual rep. • Convexity -> efficient global solution**Novelty Detection: Hypersphere**• Train data – learn support • Capture with hypersphere • Outside – ‘novel’ or ‘abnormal’ or ‘anomaly’ • Smaller sphere = more fine-tuned novelty detection**1st: Smallest Enclosing Hypersphere**• Given: • Find center, c, of smallest hypersphere containing S**S.E.H. Optimization Problem**• O.P.: • Let’s solve using Lagrangian and KKT and discuss**S.E.H.: Solution**• H(x) = 1 if x>=0, 0 o.w. Dual=primal @**Hypersphere that only contains some data – soft**hypersphere • Balance missing some points and reducing radius • Robustness –single point could throw off • Introduce slack variables (repeated approach) • 0 within sphere, squared distance outside**Hypersphere optimization problem**• Now with trade off between radius and training point error: • Let’s derive solution again**Remarks**• If data lies in subspace of feature space: • Hypersphere overestimates support in perpendicular dir. • Can use kernel PCA (next week discussion) • If normalized data (k(x,x)=1) • Corresponds to separating hyperplane, from origin**Maximal Margin Classifier**• Data and linear classifier • Hinge loss, gamma margin • Linear separable if**Typical formulation**• Typical formulation fixes gamma (functional margbin) to 1 and allows w to vary since scaling doesn’t affect decision, margin proportional to 1/norm(w) to vary. • Here we fix w norm, and vary functional margin gamma**Hard Margin SVM**• Arrive at optimization problem • Let’s solve**Solution**• Recall:**Soft Margin Classifier**• Non-separable - Introduce slack variables as before • Trade off with 1-norm of error vector**Solve Soft Margin SVM**• Let’s solve it!**Support Vector Regression**• Similar idea to classification, except turned inside-out • Epsilon-insensitive loss instead of hinge • Ridge Regression: Squared-error loss**Support Vector Regression**• But, encourage sparseness • Need inequalities • epsilon-insensitive loss**Epsilon-insensitive**• Defines band around function for 0-loss**SVR (linear epsilon)**• Opt. problem: • Let’s solve again**SVR Dual and Solution**• Dual problem**Online**• So far batch: processed all at once • Many tasks require data processed one at a time from start • Learner: • Makes prediction • Gets feedback (correct value) • Updates • Conservative only updates if non-zero loss**Simple On-line Alg.: Perceptron**• Threshold linear function • At t+1 weight updated if error • Dual update rule: • If**Novikoff Theorem**• Convergence bound for hard-margin case • If training points contained in ball of radius R around origin • w* hard margin svm with no bias and geometric margin gamma • Initial weight: • Number of updates bounded by:**Proof**• From 2 inequalities: • Putting these together we have: • Which leads to bound:**Kernel Adatron**• Simple modification to perceptron, models hard margin SVM with 0 threshold alpha stops changing, either alpha positive and right term 0, or right term negative**Kernel Adatron – Soft Margin**• 1-norm soft margin version • Add upper bound to the values of alpha (C) • 2-norm soft margin version • Add constant to diagonal of kernel matrix • SMO • To allow a variable threshold, updates must be made on pair of examples at once • Results in SMO • Rate of convergence both algs. sensitive to order • Good heuristics, e.g. choose points most violate conditions first**On-line regression**• Also works for regression case • Basic gradient ascent with additional constraints**Questions**• Questions, Comments?