1 / 54

Crash Course on Machine Learning Part IV

Crash Course on Machine Learning Part IV. Several slides from Derek Hoiem , and Ben Taskar. What you need to know. Dual SVM formulation How it’s derived The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression SVMs vs kernel regression

shaina
Download Presentation

Crash Course on Machine Learning Part IV

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Crash Course on Machine LearningPart IV Several slides from Derek Hoiem, and Ben Taskar

  2. What you need to know • Dual SVM formulation • How it’s derived • The kernel trick • Derive polynomial kernel • Common kernels • Kernelized logistic regression • SVMs vs kernel regression • SVMs vs logistic regression

  3. Example: Dalal-Triggs pedestrian detector • Extract fixed-sized (64x128 pixel) window at each position and scale • Compute HOG (histogram of gradient) features within each window • Score the window with a linear SVM classifier • Perform non-maxima suppression to remove overlapping detections with lower scores NavneetDalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

  4. NavneetDalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  5. Tested with • RGB • LAB • Grayscale Slightly better performance vs. grayscale

  6. Outperforms centered diagonal uncentered cubic-corrected Sobel Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  7. Histogram of gradient orientations • Votes weighted by magnitude • Bilinear interpolation between cells Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  8. Normalize with respect to surrounding cells Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  9. # orientations # features = 15 x 7 x 9 x 4 = 3780 X= # cells # normalizations by neighboring cells Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  10. neg w pos w NavneetDalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  11. pedestrian NavneetDalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Slides by Pete Barnum

  12. Detection examples

  13. Viola-Jones sliding window detector Fast detection through two mechanisms • Quickly eliminate unlikely windows • Use features that are fast to compute Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).

  14. Cascade for Fast Detection • Choose threshold for low false negative rate • Fast classifiers early in cascade • Slow classifiers later, but most examples don’t get there Yes Yes Stage 1 H1(x) > t1? Stage N HN(x) > tN? Stage 2 H2(x) > t2? … Pass No No No Examples Reject Reject Reject

  15. Features that are fast to compute • “Haar-like features” • Differences of sums of intensity • Thousands, computed at various positions and scales within detection window -1 +1 Two-rectangle features Three-rectangle features Etc.

  16. Feature selection with Adaboost • Create a large pool of features (180K) • Select features that are discriminative and work well together • “Weak learner” = feature + threshold • Choose weak learner that minimizes error on the weighted training set • Reweight

  17. Top 2 selected features

  18. Viola Jones Results Speed = 15 FPS (in 2001) MIT + CMU face dataset

  19. What about pose estimation?

  20. What about interactions?

  21. 3D modeling

  22. Object context From Divvala et al. CVPR 2009

  23. Integration • Feature level • Margin Based • Max margin Structure Learning • Probabilistic • Graphical Models

  24. Integration • Feature level • Margin Based • Max margin Structure Learning • Probabilistic • Graphical Models

  25. Feature Passing • Compute features from one estimated scene property to help estimate another Image X Features X Estimate Y Features Y Estimate

  26. Feature passing: example Use features computed from “geometric context” confidence images to improve object detection Features: average confidence within each window Above Object Window Below Hoiem et al. ICCV 2005

  27. Scene Understanding 1 1 0 0 0 0 0 0 0 0 Recognition using Visual Phrases , CVPR 2011

  28. Feature Design Above Beside Below Recognition using Visual Phrases , CVPR 2011

  29. Feature Passing • Pros and cons • Simple training and inference • Very flexible in modeling interactions • Not modular • if we get a new method for first estimates, we may need to retrain

  30. Integration • Feature Passing • Margin Based • Max margin Structure Learning • Probabilistic • Graphical Models

  31. Structured Prediction • Prediction of complex outputs • Structured outputs: multivariate, correlated, constrained • Novel, general way to solve many learning problems

  32. Structure 1 1 0 0 0 0 0 0 0 0 Recognition using Visual Phrases , CVPR 2011

  33. Handwriting Recognition x y brace Sequential structure

  34. Object Segmentation x y Spatial structure

  35. Scene Parsing Recursive structure

  36. Bipartite Matching En vertu de les nouvelles propositions , quel est le coût prévu de perception de les droits ? x y What is the anticipated cost of collecting fees under the new proposal ? What is the anticipated cost of collecting fees under the new proposal? En vertu des nouvelles propositions, quel est le coût prévu de perception des droits? Combinatorial structure

  37. Local Prediction Classify using local information  Ignores correlations & constraints! b r a c e

  38. building tree shrub ground Local Prediction

  39. Structured Prediction • Use local information • Exploit correlations b r a c e

  40. building tree shrub ground Structured Prediction

  41. Structured Models Mild assumptions: linear combination sum of part scores scoring function space of feasible outputs

  42. Supervised Structured Prediction Model: Prediction Learning Data Estimatew Example: Weighted matching Generally: Combinatorialoptimization Local (ignores structure) Margin Likelihood (can be intractable)

  43. Local Estimation Model: • Treat edges as independent decisions • Estimate w locally, use globally • E.g., naïve Bayes, SVM, logistic regression • Cf.[Matusov+al, 03]for matchings • Simple and cheap • Not well-calibrated for matching model • Ignores correlations & constraints Data

  44. Conditional Likelihood Estimation Model: • Estimate w jointly: • Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93] • Tractablemodel,intractable learning • Need tractable learning method  margin-basedestimation Data

  45. Structured large margin estimation • We want: • Equivalently: “brace” “brace” “aaaaa” “brace” “aaaab” a lot! … “brace” “zzzzz”

  46. Structured Loss b c a r e 2 b r o r e 2 b r o c e 1 b r a c e 0

  47. Large margin estimation • Given training examples , we want: • Maximize margin • Mistake weighted margin: # of mistakes in y *Collins 02, Altun et al 03, Taskar 03

  48. Large margin estimation • Eliminate • Add slacks for inseparable case (hinge loss)

  49. Large margin estimation • Brute force enumeration • Min-max formulation • ‘Plug-in’ linear program for inference

  50. Min-max formulation Structured loss (Hamming): Inference LP Inference Key step: discrete optim. continuous optim.

More Related