1 / 36

Towards Sublinear Time Multiclass Object Detection

Towards Sublinear Time Multiclass Object Detection. Sam Davies. The Challenge. Recognize objects in images Many object classes Many 3D views Feasible on consumer hardware. Applications. Cars that drive themselves Other robots… Assistive devices for the blind. This Talk.

sarai
Download Presentation

Towards Sublinear Time Multiclass Object Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Sublinear TimeMulticlass Object Detection Sam Davies

  2. The Challenge • Recognize objects in images • Many object classes • Many 3D views • Feasible on consumer hardware

  3. Applications • Cars that drive themselves • Other robots… • Assistive devices for the blind

  4. This Talk • Use an existing object representation [Crandall ’05] • Propose a faster detection algorithm • equivalent accuracy • Present initial experiments that suggest • It scales well with #classes x #views • Empirically sublinear

  5. Talk Overview • Past Work • Part-based detection • 1-Fan/Star Model • Proposed Algorithm • Results • Next Steps • Feature Sharing

  6. Past Work: State of the Art • Part-based • Shape • Appearance • Relatively high accuracy • (for this presentation, assume good enough) • Mostly single view, single class • Linear running time in C (#classes x #views) • (or parallelize with N processors -- $$$!) • Multiclass part sharing [Torralba 2004] • Improve running time – empirically O(log C) • Restricted shape model

  7. Past Work: Part-Based Detection • Rigid pieces held together by “springs.” • The springs joining the rigid pieces • Constrain relative movement • Measure the cost of the movement • Cost of an embedding: • Measure the “tension” on each spring, and • A local evaluation of how well each coherent piece is embedded [Fischler, Elschlager 1973]

  8. Past Work: Part-Based Detection • Global measurement (shape) • Constellation / arrangement of part positions • Spring stretching / compressing • Cost / energy associated with relative positions of pairs of parts • Local measurement (appearance) • Rigid local part from image information • Independently measured for each part

  9. Past Work: Part-Based Detection • Find best location of all the parts (highest sum of weighted votes) • minimize spring tension and part matching energies • MAP estimation: maximum probability of part locations for a test image

  10. Past Work: 1-Fan/Star Model • Restrict all parts to only be connected to the center part

  11. Past Work: 1-Fan/Star Model • Restrict all parts to only be connected to the center part • More efficient detection (dynamic programming) • Shown to be reasonably accurate [Crandall 2005, Fergus 2005]

  12. Past Work: 1-Fan/Star Model • Hough Transform • Each part “votes” for location of the center part • Votes are weighted according to spring definitions

  13. Past Work: 1-Fan/Star Model Use Gaussians for shape models [Crandall 2005, Fergus 2005]

  14. Past Work: 1-Fan/Star Model O(N) O(N) + O(N2) O(N) O(N) x O(P)  O(PN) + O(PN) (sum) + O(N) (max) O(PN) x O(C)  O(CPN) N: # pixels P: # parts C: # classes x # views

  15. An Idea

  16. Proposed Algorithm • Idea: • Run max, sum, distance transform computations all together • Adaptively • Divide into image pyramids

  17. Proposed Algorithm • Key observation: • We can quickly calculate an upper bound of the distance transform in a desired image pyramid cell • Then refine in the most promising areas

  18. Proposed Algorithm • Start with a coarse approximation • Ignore shape information all together • Think: largest cell in the image pyramid groups all pixels into one • Equivalent to bag-of-words (0-fan)

  19. Proposed Algorithm • For the object that looks most promising, descend down to a finer resolution in the hierarchy, and re-estimate the distance transform. • Based on a hierarchical A* framework [Macallester ’07] • Admissible heuristic based on upper bound estimate for coarse estimates

  20. ???

  21. ???

  22. max

  23. max

  24. Results: Match Time

  25. Results: Total Time

  26. Next Steps • Recall: • Appearance correlation is still O(PC) • P = # parts, C = #classes x # views • Even if shape matching is sublinear, we still have: O(PC) + o(C) = O(PC) • Need to make correlation sublinear as well.

  27. Past Work: Feature Sharing [Torralba 2004]

  28. Past Work: Feature Sharing empirically “O(log(C))”

  29. Next Steps • Combine • Sublinear appearance correlation (via feature sharing) with • Sublinear shape searching (described here) • We get: • o(C) + o(C) = o(C)

More Related