180 likes | 208 Views
AdaScale methodology improves video object detection by regressing scales from input images to generate target labels, focusing on foreground boxes. Results show speed and accuracy enhancements. Implementation details and results discussed.
E N D
SysML 2019 • AdaScale: Towards Real-time Video Object Detection using Adaptive Scaling • Ting-Wu (Rudy) Chin* Ruizhuo Ding* Diana Marculescu • ECE Dept., Carnegie Mellon University
Video object detection is one of the key tasks in various emerging applications Autonomous Cars1 Household Robots3 Autonomous Drones2 1. https://medium.com/udacity/how-the-udacity-self-driving-car-works-575365270a40 2. https://software.intel.com/en-us/articles/object-detection-on-drone-videos-using-caffe-framework 3. Loghmani, Mohammad Reza, Barbara Caputo, and Markus Vincze. "Recognizing objects in-the-wild: Where do we stand?." 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018.
Prior art uses scales to trade speed for accuracy RetinaNet5 YOLOv26 5. Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV. 2017. 6. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." CVPR. 2017.
How to determine which image to scale by how much? A regression problem Scaling-down an image (resolution) may sometimes help • Down-sampling could reduce noises, which further reduce False Positives
Outline • Motivation • From images to scales: a regression problem • AdaScale methodology • Results
Regressing scales from input images • To generate target labels • Choose a set of discrete scales to broadly cover the scales of interest. • For each image in the training set, evaluate every scale with a metric to identify the best scale.
Current loss function favors extreme scales • will not introduce regression loss if it is in background predicted bounding box ground truth bounding box
Our proposal: only consider the foreground boxes Foreground bounding box Background bounding box Sort by loss
Outline • Motivation • From images to scales: a regression problem • AdaScale methodology • Results
The overall flow of AdaScale Multi-scale training for scale regressor (freezing object detector) Fine-tune Object Detectors with multi-scale training Generate labels for the scale regressor Training Testing t Object Detector t+n Backbone CNN Scaling t+1 Scale Regressor t (real value) For t+1
Outline • Motivation • From images to scales: a regression problem • AdaScale methodology • Results
AdaScale on ImageNet VID SS/SS: Single-scale Training, Single-scale Testing MS/SS: Multi-scale Training, Single-scale Testing MS/AdaScale: Multi-scale Training, AdaScale Testing MS/Ada SS/SS MS/SS
Ablation study: multi-scale fine-tuning Regressed scales
Qualitative analysis: comparison with baseline SS/SS MS/AdaScale
Conclusions • We propose AdaScale, which improves both speed and accuracy in video object detection with image scaling instead of trading one for the other. • Our results demonstrate 1.3 and 2.7 mAP improvement on ImageNet VID and mini-YoutubeBB datasets with 1.6x and 1.8x speedup, respectively. • Together with state-of-the-art video object detection acceleration technique (i.e., Deep Feature Flow), we further push the speedup by 1.25x with slightly better mAP.
Q & A • Thank you