1 / 17

AdaScale: Towards Real-time Video Object Detection using Adaptive Scaling

SysML 2019. AdaScale: Towards Real-time Video Object Detection using Adaptive Scaling. Ting-Wu (Rudy) Chin* Ruizhuo Ding* Diana Marculescu ECE Dept., Carnegie Mellon University. Video object detection is one of the key tasks in various emerging applications. Autonomous Cars 1.

michellee
Download Presentation

AdaScale: Towards Real-time Video Object Detection using Adaptive Scaling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SysML 2019 • AdaScale: Towards Real-time Video Object Detection using Adaptive Scaling • Ting-Wu (Rudy) Chin* Ruizhuo Ding* Diana Marculescu • ECE Dept., Carnegie Mellon University

  2. Video object detection is one of the key tasks in various emerging applications Autonomous Cars1 Household Robots3 Autonomous Drones2 1. https://medium.com/udacity/how-the-udacity-self-driving-car-works-575365270a40 2. https://software.intel.com/en-us/articles/object-detection-on-drone-videos-using-caffe-framework 3. Loghmani, Mohammad Reza, Barbara Caputo, and Markus Vincze. "Recognizing objects in-the-wild: Where do we stand?." 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018.

  3. Prior art uses scales to trade speed for accuracy RetinaNet5 YOLOv26 5. Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV. 2017. 6. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." CVPR. 2017.

  4. How to determine which image to scale by how much? A regression problem Scaling-down an image (resolution) may sometimes help • Down-sampling could reduce noises, which further reduce False Positives

  5. Outline • Motivation • From images to scales: a regression problem • AdaScale methodology • Results

  6. Regressing scales from input images • To generate target labels • Choose a set of discrete scales to broadly cover the scales of interest. • For each image in the training set, evaluate every scale with a metric to identify the best scale.

  7. Current loss function favors extreme scales • will not introduce regression loss if it is in background predicted bounding box ground truth bounding box

  8. Our proposal: only consider the foreground boxes Foreground bounding box Background bounding box Sort by loss

  9. Outline • Motivation • From images to scales: a regression problem • AdaScale methodology • Results

  10. The overall flow of AdaScale Multi-scale training for scale regressor (freezing object detector) Fine-tune Object Detectors with multi-scale training Generate labels for the scale regressor Training Testing t Object Detector t+n Backbone CNN Scaling t+1 Scale Regressor t (real value) For t+1

  11. Outline • Motivation • From images to scales: a regression problem • AdaScale methodology • Results

  12. AdaScale on ImageNet VID SS/SS: Single-scale Training, Single-scale Testing MS/SS: Multi-scale Training, Single-scale Testing MS/AdaScale: Multi-scale Training, AdaScale Testing MS/Ada SS/SS MS/SS

  13. Ablation study: multi-scale fine-tuning Regressed scales

  14. Qualitative analysis: dynamics of AdaScale

  15. Qualitative analysis: comparison with baseline SS/SS MS/AdaScale

  16. Conclusions • We propose AdaScale, which improves both speed and accuracy in video object detection with image scaling instead of trading one for the other. • Our results demonstrate 1.3 and 2.7 mAP improvement on ImageNet VID and mini-YoutubeBB datasets with 1.6x and 1.8x speedup, respectively. • Together with state-of-the-art video object detection acceleration technique (i.e., Deep Feature Flow), we further push the speedup by 1.25x with slightly better mAP.

  17. Q & A • Thank you

More Related