1 / 36

Object Detection using Deep Neural Network

Object Detection using Deep Neural Network. Wan-Ru, Lin 2016/10/27. Outline. Introduction Background R-CNN (2014) SPPnet (2014) – speedup R-CNN Fast R-CNN (2015) Faster R-CNN (2015) YOLO (2015). Introduction. Object detection has long been an interesting task in computer vision

newmanm
Download Presentation

Object Detection using Deep Neural Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object DetectionusingDeep Neural Network Wan-Ru, Lin 2016/10/27

  2. Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

  3. Introduction • Object detection has long been an interesting task in computer vision • Location (x,y,w,h) • Classification

  4. Introduction • Before fast R-CNN (2015)… • After fast R-CNN … cat Classifier Feature extraction Region proposal cat Region proposal Feature extraction Classifier [R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2015]

  5. Introduction (2014) (2015) (2015) YOLO (2015)

  6. Background • Convolution Neural Network(CNN) • Convolution • Nonlinearity – (sigmoid , ReLU) • Pooling classifier Feature extractor

  7. Background • Pooling • reduce the spatial size • translation invariant • Loss function • Error backpropagation

  8. Background Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor • PASCAL VOC • Location • Class

  9. Background • Pre-training • ILSVRC dataset ~ 120W images • Fine-tuning • PASCAL VOC 2012

  10. Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

  11. R-CNN • Multi-stage SVM Selective Search

  12. R-CNN • Selective Search • Generate possible object locations

  13. R-CNN • Training • Supervised pre-training : ILSVRC 2012 • Domain-specific fine-tuning : • warp input • output number : 1000 -> 20 + 1(ground truth) • SVM • Separate data with hyperplane

  14. R-CNN • Disadvantage of R-CNN • Distortion due to warping • Training is a multi-stage pipeline • Training is expensive in space and time • Object detection is slow • VGG takes 47s/image

  15. R-CNN

  16. Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

  17. SPPnet • Share feature map • Fixed-length feature • Assume bins • ROI size : • Pooling window size = • Avoid image warping

  18. SPPnet • Share feature maps speed up R-CNN • Achieve comparable mAP with R-CNN

  19. Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

  20. Fast R-CNN 1-scale SPP layer (7x7) • Single-stage training • Training can update all network layer Selective Search ~2K

  21. Fast R-CNN • Multi-task loss • Output : • v

  22. Fast R-CNN • Contributions • Higher mAP than R-CNN and SPPnet • Training is single-stage, using multi-task loss • Training can update all network layers • No disk storage is required for feature caching

  23. Fast R-CNN

  24. Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

  25. Faster R-CNN • Selective search consumes much running time • Fast R-CNN • Region proposal network (RPN)

  26. Faster R-CNN • Region proposal network (RPN) • Pick top-ranked 100 proposal at test time

  27. Faster R-CNN • Timing(ms)

  28. Faster R-CNN • Contribution • Present RPNs for efficient and accurate region proposal generation • Sharing convolutional features for region proposal and object detection

  29. Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

  30. YOLO • Use features from the entire image to predict each bounding box • Single neural network • Region proposal • Feature extraction • Classification • Bounding box regression

  31. YOLO • Divide input image to grid • Each grid cell • predict 2 bounding boxes (x,y,w,h) • Confidence scores of bounding boxes • Predict class probabilities :

  32. YOLO IOU = 0.8 IOU = 0.3 • Output number =

  33. YOLO • VOC 2007

  34. YOLO

  35. YOLO • Limitation • Struggle with small objects that appear in groups • Struggle to generalize to objects in new or unusual aspect ratios or configurations

  36. Reference [1] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. [2] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013. [3] R. B. Girshick. Fast R-CNN. CoRR, abs/1504.08083, 2015 [4] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497, 2015 [5] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015). [6] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014.

More Related