1 / 31

FOCUS : Clustering Crowdsourced Videos by Line-of-Sight

FOCUS : Clustering Crowdsourced Videos by Line-of-Sight. Puneet Jain , Justin Manweiler , Arup Acharya , and Kirk Beaty. Clustered by shared subject. c hallenges. CAN IMAGE PROCESSING SOLVE THIS PROBLEM?. Camera 1. Camera 2. LOGICAL similarity does not imply VISUAL similarity.

sylvie
Download Presentation

FOCUS : Clustering Crowdsourced Videos by Line-of-Sight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and Kirk Beaty

  2. Clustered by shared subject

  3. challenges

  4. CAN IMAGE PROCESSING SOLVE THIS PROBLEM?

  5. Camera 1 Camera 2 LOGICAL similarity does not implyVISUAL similarity Camera 3 Camera 4

  6. VISUAL similarity does not implyLOGICAL similarity

  7. CAN SMARTPHONE SENSING SOLVE THIS PROBLEM?

  8. Why not triangulate? Sensors are noisy, hard to distinguish subjects…

  9. GPS-COMPASS Line-of-Sight

  10. INSIGHT

  11. easy to identify hard to identify Don’t need to visually identify actual SUBJECT, can use background as PROXY Simplifying Insight 1

  12. same basic structure persists Don’t need to directly match videos, can compare all to a predefined visual MODEL Simplifying Insight 2

  13. Light-of-sight (triangulation) is almost enough, just not via sensing (alone) Simplifying Insight 3

  14. Fast Optical Clustering of live User Streams FOCUS

  15. Hadoop/HDFSFailover, elasticity Image processing Computer vision Clustered Videos Video Extraction Video Streams (Android, iOS, etc.) FOCUS Cloud Video Analytics Watching Live home: 2 away: 1 Change Angle Change Focus Users Select & Watch Organized Streams

  16. Hadoop/HDFSFailover, elasticity Image processing Computer vision Clustered Videos Video Extraction FOCUS Cloud Video Analytics pre-defined reference “model” Watching Live home: 2 away: 1 Change Angle Change Focus Users Select & Watch Organized Streams

  17. z z keypoint extraction multi-view reconstruction estimates camera POSE and content in field-of-view Multi-view Stereo Reconstruction Model construction technique based on Photo Tourism: Exploring image collections in 3D Snavely et al., SIGGRAPH 2006

  18. Visualizing Camera Pose

  19. z z keypoint extraction multi-view reconstruction z frame-by-frame video to model alignment ~ 18 seconds at 90th% ~ 1 second at 90th% z sensory inputs • Given a pre-defined 3D, align incoming video frames to the model • Also known as camera pose estimation

  20. z z keypoint extraction multi-view reconstruction z integration of sensory inputs Gyroscope, provides “diff” from vision initial position Gyroscope 0 1 2 3 4 t - 1 t - 2 Filesize ≈ 1/Blur Sampled Frame

  21. z z z keypoint extraction multi-view reconstruction pairwise model image analysis Field-of-view Using POSE + model POINT CLOUD, FOCUS geometrically identifies the set of model points in background of view

  22. z z z keypoint extraction multi-view reconstruction pairwise model image analysis 3 Similarity between image 1 & 2 = 18 Similarity between image 1 & 3 = 13 2 1 Finding the similarity across videos as size of point cloud set intersection

  23. Clustering “similar” videos Similarity Score • Application of Modularity Maximization • high modularity implies: • high correlation among the members of a cluster • minor correlation with the members of other clusters 1 1 2 2 3 3

  24. results

  25. Collegiate Football Stadium • Stadium 33K seats56K maximum attendance • Model: 190K points 412 images (2896 x 1944 resolution) • Android Appon Samsung Galaxy Nexus, S3 • 325 videos captured 15-30 seconds each

  26. Line-of-Sight Accuracy (visual)

  27. Line-of-Sight Accuracy GPS/Compass LOS estimation is <260 meters for the same percentage In >80% of the cases, Line-of-sight estimation is off by < 40 meters

  28. FOCUS Performance 75% true positives Trigger GPS/Compass failover techniques

  29. Natural Questions • What if 3D model is not available? • Online model generation from first few uploads • Stadiums look very different on a game day? • Rigid structures in the background persists • Where it won’t work? • Natural or dynamic environment are hard

  30. Conclusion • Computer vision and image processing are often computation hungry, restricting real-time deployment • Mobile Sensing is a powerful metadata, can often reduce computation burden • Computer vision + Mobile Sensing + Geometry, along with right set of BigData tools, can enable many real-time applications • FOCUS, displays one such fusion, a ripe area for further research

  31. Thank You http://cs.duke.edu/~puneet

More Related