1 / 134

Training Discriminative Computer Vision Models with Weak Supervision

Training Discriminative Computer Vision Models with Weak Supervision. Boris Babenko PhD Defense University of California, San Diego. Outline. Overview Supervised Learning Weakly Supervised Learning Weakly Labeled Location Object Localization and Recognition Object Detection with Parts

Download Presentation

Training Discriminative Computer Vision Models with Weak Supervision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training Discriminative Computer Vision Models with Weak Supervision Boris Babenko PhD Defense University of California, San Diego

  2. Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work

  3. Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work

  4. Computer Vision Problems • Want to detect, recognize/classify, track objects in images and videos • Examples: • Face detection for point-and-shoot cameras • Pedestrian detection for cars • Animal tracking for behavioral science • Landmark/place recognition for search-by-image

  5. Old School • Hand tuned models per application • Example: face detection [Yang et al. ‘94]

  6. New School • Adopt methods from machine learning • Train a generic* system by providing labeled examples (supervised learning) • Labeling examples is intuitive • Adapt to new domains/applications • Learn subtle queues that would be impossible to model by hand * Hand tuning/design still often required :-/

  7. Supervised Learning • Training data: pairs of inputs and labels • Train classifier to predict label for novel input TRAINING RUN TIME ( ,non-face) ( ) ( , face) ( , face) ( ,non-face)

  8. Supervised Learning • Training data: • Most common case: • Want to train a classifier: • Typically a classifier also outputs a confidence score, in addition to label Inputs/instances: Labels:

  9. Discriminative vs Generative • Generative: model the distribution of the data • Discriminative: directly minimize classification error, model the boundary • E.g. SVM, AdaBoost, Perceptron • Tends to outperform generative models

  10. Training Discriminative Model • Objective (minimize training error) • Loss function, , is typically a convex upper bound on 0/1 loss • Regularization term can help avoid over-fitting

  11. Weak Supervision • Slightly overloaded term… • Any form of learning where the training data is missing some labels (i.e. latent variables)

  12. Object Detection w/ Weak Supervision • Goal: train object detector • Strong: • Weak: only presence of object is known, not location + ( , face) + ( , face) ( ,non-face) -

  13. Object Detection w/ Weak Supervision • Goal: train object detector • Strong: • Weak: only presence of object is known, not location <- latent + ( , face) + ( , face) ( ,non-face) -

  14. Weak Supervision: Advantages • Reduce labor cost • Deal with inherent ambiguity & human error • Automatically discover latent information

  15. Training w/ Latent Variables • Classifier now takes in input AND latent input • To predict label: • Objective:

  16. Training w/ Latent Variables • Classifier now takes in input AND latent input • To predict label: • Objective: • Not convex!

  17. Training w/ Latent Variables • Two ways of solving • Method 1: Alternate between finding latent variables and training classifier • Finding latent variables given a fixed classifier may require domain knowledge • E.g. EM (Dempster et al.), Latent Structural SVM (Yu & Joachims) – based on CCCP (Yuille & Rangarajan), Latent SVM (Felzenszwalb et al.), MI-SVM (Andrews et al.)

  18. Training w/ Latent Variables • Method 2: Replace the hard max with “soft” approximation, and then do gradient descent • E.g. MILBoost (Viola et al.), MIL-Logistic Regression (Ray et al.)

  19. Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Detection, Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work

  20. Object Detection w/ Weak Supervision • Goal: train object detector • Only presence of object is known, not location • Can’t “just throw these into a learning alg.” – very difficult to design invariant features + + -

  21. Multiple Instance Learning (MIL) • (set of inputs, label) pairs provided • MIL lingo: set of inputs = bag of instances • Learner does not see instance labels • Bag labeled positive if at least one instance in bag is positive [Keeler et al. ‘90, Dietterich et al. ‘97]

  22. Object Detection w/ MIL { … } + Instance: image patch Instance Label: is face? Bag: whole image Bag Label: contains face? { … } + { … } - [Andrews et al. ’02, Viola et al. ’05, Dollar et al. 08, Galleguillos et al. 08]

  23. MIL Notation • Training input: Bags: Bag Labels: Instance Labels: (unknown during training)

  24. MIL • Positive bag contains at least one positive instance • Goal: learning instance classifier • Corresponding bag classifier

  25. MIL Algorithms • Many “standard” learning algorithms have been adapted to the MIL scenario: • SVM (Andrews et al. ‘02), Boosting (Viola et al. ‘05), Logistic Regression (Ray et al. ‘05) • Some specialized algorithms also exist • DD (Maron et al. ’98), EM-DD (Zhang et al. ‘02)

  26. MIL Algorithms • Objective: minimize bag error on training data • MILBoost (Viola et al. ‘05) • Replace max with differentiable approximation • Use functional gradient descent (Mason et al. ’00, Friedman ’01) Bag label according to , i.e.

  27. Object Detection • Have a learning framework (MIL), and an algorithm to train classifier (MILBoost) • Question: how exactly do we form a bag? { …} Segmentation { …} Sliding Window

  28. Forming a bag via segmentation • Pro: get more precise localization • Con: segmentation algorithms often fail; require prior knowledge (e.g. number of segments) • If segmentation fails, we might not see “the” positive instance in a positive bag • Only way to prevent this is to use ALL possible segments… not practical

  29. Multiple Stable Segmentations (MSS) • Solution: Multiple Stable Segmentations (Rabinovich et al. ‘06) • A heuristic for picking out a few “good” segments from the huge set of all possible segments • End up with more segments, but higher chance of getting the “right” segment

  30. { …} Multiple Instance Learning with Stable Segmentation (MILSS) Multiple Stable Segmentation BOF BOF BOF BOF • Localization and Recognition • Features: BOF on SIFT • Classifier: MILBoost one-vs-all (for multiclass) [ Work with Carolina Galleguillos, Andrew Rabinovich & Serge Belongie – ECCV ‘08]

  31. Results: Landmarks

  32. Results: Landmarks

  33. More segments = better results Our System NCuts w/ k=6 NCuts w/ k=4

  34. Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work

  35. Object Detection with Parts • Pedestrians are non-rigid • Difficult to design features that are invariant • Decision boundary very complex • Objects parts are rigid

  36. Object Detection with Parts • Naïve sol’n: label parts and train detectors • Labor intensive • Sub-optimal (e.g. “space between the legs”) • Better sol’n: • Use rough location of objects • Treat part locations as latent variables [Mohan et al. ’01, Mikolajczyk et al. ‘04]

  37. Multiple Component Learning (MCL) • How to train a part detector from weakly labeled data? • How to train many, diverse part detectors • How to combine part detectors and incorporate spatial information [Work with Piotr Dollar, PietroPerona, ZhuowenTu & Serge Belongie ECCV ‘08]

  38. MCL: One Part Detector • Fits perfectly into MIL • Which part does it learn? { … } + { … } +

  39. MCL: Diverse Parts • Pedestrian images are “roughly aligned” • Choose random sections of the images to feed into MIL

  40. MCL: Top 5 Learned Detectors

  41. MCL: Combining Part Detectors • Run part detectors, get response map • Compute Haar features on top, plug into Boosting Confidence maps from each part detector

  42. MCL: Results • INRIA Pedestrian dataset

  43. MCL: Results

  44. MCL: Related Work • P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. "Object Detection with Discriminatively Trained Part-Based Models" IEEE PAMI. Sept 2009. • Very similar model, uses SVM instead of Boosting, and an explicit shape model • L. Bourdev, S. Maji, T. Brox, J. Malik. “Detecting people using mutually consistent poselet activations” ECCV 2010.

  45. Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work

  46. Object Tracking • Problem: given location of object in first frame, track object through video • Tracking by Detection: alternate training detector and running it on each frame

  47. Tracking by Detection • First frame is labeled

  48. Tracking by Detection • First frame is labeled Online classifier (i.e. Online AdaBoost) Classifier

  49. Tracking by Detection • Grab one positive patch, and some negative patch, and train/update the model. negative positive Classifier

More Related