1 / 22

A coarse-to-fine approach for fast deformable object detection

A coarse-to-fine approach for fast deformable object detection. Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez. Object detection. [Fischler Elschlager 1973]. [Felzenszwalb et al 08]. [Zhu et al 10]. [Vedaldi Zisserman 2009]. 2. Addressing the computational bottleneck

truda
Download Presentation

A coarse-to-fine approach for fast deformable object detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A coarse-to-fine approach for fast deformable object detection • Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez

  2. Object detection [Fischler Elschlager 1973] [Felzenszwalb et al 08] [Zhu et al 10] [Vedaldi Zisserman 2009] 2 • Addressing the computational bottleneck • branch-and-bound [Blaschko Lampert 08, Lehmann et al. 09] • cascades[Viola Jones 01, Vedaldi et al. 09, Felzenszwalb et al 10, Weiss Taskar 10] • jumping windows [Chum 07] • sampling windows [Gualdi et al. 10] • coarse-to-fine [Fleuret German 01, Zhang et al 07, Pedersoli et al. 10] [VOC 2010]

  3. Analysis of the cost of pictorial structures 3

  4. The cost of pictorial structures 4 • cost of inference • one part: L • two parts: L2 • … • P parts: LP • with a tree • using dynamic programming • PL2 • Polynomial, but still too slow in practice • with a tree and quadratic springs • using the distance transform[Felzenszwalb and Huttenlocher 05] • PL • In principle, millions of times faster than dynamic programming! L = number of part locations ~ number of pixels ~ millions

  5. A notable case: deformable part models 5 5 • Deformable part model [Felzenszwalb et al. 08] • locations are discrete • deformations are bounded • number of possible part locations: • L L / δ2 image • cost of placing two parts: C= max. deformation size C PL / δ2 • total geometric cost: • L2 LC, C << L δ

  6. A notable case: deformable part models • With deformable part models • finding the optimal parts configuration is cheap • distance transform speed-up is limited • Standard analysis does not account for filtering: • Typical example • filter size: F = 6 × 6 × 32 • deformation size: C = 6 × 6 • Filtering dominates the finding the optimal part configuration! geometric cost: C PL / δ2 filtering cost: F PL / δ2 • F = size of filter (F + C) PL / δ2 • total cost: image

  7. Accelerating deformable part models • Cascade of deformable parts[Felzenszwalb et al. 2010] • detect parts sequentially • stop when confidence below a threshold • Coarse-to-fine localization[Pedersoli et al. 2010] • multi-resolution search • we extend this idea todeformable part models • deformable part model cost: (F + C) PL / δ2 the key is reducing the filter evaluations

  8. Our contribution:Coarse-to-fine for deformable models

  9. Our model • Multi-resolution deformable parts • each part is a HOG filter • recursive arrangement • resolution doubles • bounded deformation • Score of a configuration S(y) • HOG filter score • parent-child deformation score image

  10. Coarse-to-Fine search 10

  11. Quantify the saving 11 • 1D view (circle = part location) • 2D view # filter evaluations CTF exact L L L 4L L 16L overall speedup 4R exponentially larger saving

  12. Lateral constraints 12 • Geometry in deformable part models is cheap • can afford additional constraints • Lateral constraints • connect sibling parts • Inference • use dynamic programming within each level • open the cycle by conditioning one node

  13. Lateral constraints 13 • Why are lateral constraints useful? • Encourage consistent local deformations • without lateral constraints siblings move independently • no way to make their motion coherent without lateral constraints y and y’ have thesame geometric cost with lateral constraints y can be encouraged

  14. Experiments

  15. Effect of deformation size 15 • INRIA pedestrian dataset • C = deformation size (HOG cells) • AP = average precision (%) • Coarse-to-fine (CTF) inference • Remarks • large C slows down inference but does not improve precision • small C implies already substantial part deformation due tomultiple resolutions

  16. Effect of the lateral constraints 16 • Exact vs Coarse-to-fine (CTF) inference • CTF ~ exact inference scores • CTF ≤ exact • bound is tighter withlateral constraints • Effect is significant on training as well • additional coherence avoids spurious solutions • Examplelearning the head model • Big improvement with coarse-to-fine search • Example: learning the head model • Effect on the inference scores tree tree + lat. exact score CTF score CTF learning and tree CTF learning and tree + lat.

  17. Training speed • Structured latent SVM [Felzenszwalb et al. 08, Vedaldi et al. 09] • deformations of training objects are unknown • estimated as latent variables • Algorithm • Initialization: no negative examples, no deformations • Outer loop • Inner loop • Collect hard negative examples (CTF inference) • Learn the model parameters (SGD) • Estimate the deformations (CTF inference) • The training speed is dominated by the cost of inference! > 10×speedup!

  18. PASCAL VOC 2007 18 • Evaluate on the detection of 20 different object categories • ~5,000 images for training, ~5,000 images for testing • Remarks • very good for aeroplane, bicycle, boat, table, horse, motorbike, sheep • less good for bottle, sofa, tv • Speed-accuracy trade-off • time is drastically reduced • hit on AP is small

  19. Comparison to the cascade of parts • Cascade of parts [Felzenszwalb et al. 10] • test parts sequentially, reject when score falls below threshold • saving at unpromisinglocations (content dependent) • difficult to use in training (thresholds must be learned) • Coarse-to-fine inference • saving is uniform (content independent) • can be used during training 19

  20. Coarse-to-fine cascade of parts • Cascade and CTF use orthogonal principles • easily combined • speed-up multiplies! • Example • apply a threshold at the root • plot AP vs speed-up • In some cases 100 x speed-upcan be achieved CTF CTF CTF cascadescore > τ1? cascadescore > τ2? reject reject

  21. Summary • Analysis of deformable part models • filteringdominates the geometric configuration cost • speed-up requires reducing filtering • Coarse-to-fine search for deformable models • lower resolutions can drive the search at higher resolutions • lateral constraints add coherence to the search • exponential saving independent of the image content • can be used for training too • Practical results • 10x speed-up on VOC and INRIA with minimum AP loss • can be combined with cascade of parts for multiplied speedup • Future • More complex models with rotation, foreshortening, …

  22. Thank you!

More Related