1 / 28

Ruohua Shi

CVPR 2019 oral. Ruohua Shi. 2019/6/27. Motivation. abundant labeled synthetic data (source domain) + insufficient labeled real data (target domain) = a better performance on real data. domain gap. Motivation. abundant labeled synthetic data (source domain) +

pete
Download Presentation

Ruohua Shi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CVPR 2019 oral Ruohua Shi 2019/6/27

  2. Motivation abundant labeled synthetic data (source domain) + insufficient labeled real data (target domain) = a better performance on real data domain gap

  3. Motivation abundant labeled synthetic data (source domain) + in- sufficient labeled real data (target domain) = a better performance on real data domain gap

  4. Motivation • data selection • domain transformation Common drawback: they are applied at single level pixel-, region- and image-levels. Hierarchical transfer learning framework Goal: Perform transfer learning from real data (target domain) and abundant synthetic data (source domain) to improve the performance of semantic segmentation.

  5. Methodology

  6. Methodology

  7. Methodology

  8. Loss function

  9. Hierarchical Weighting Networks = 0.5 forimagelevel

  10. Methodology

  11. Methodology

  12. Network Optimization

  13. Network Optimization

  14. Network Optimization

  15. Network Optimization

  16. Network Optimization

  17. Network Optimization

  18. Network Optimization

  19. Network Optimization

  20. Experiments • Two synthetic datasets • GTAV has 24,966 urban scene images rendered by the gaming engine GTAV. The semantic categories are compatible with the CITYSCAPES dataset. We take the whole GTAV dataset with labels as the source domain data. • SYNTHIA is a large dataset which contains different video sequences rendered from a virtual city. We take SYNTHIA-RAND-CITYSCAPES as the source domain data which provides 9,400 images from all the sequences with CITYSCAPES-compatible annotations. • One real-world dataset • CITYSCAPES is a real-world image dataset focused on the urban scene, which consists of 2,975 images in training set and 500 images for validation. The resolution of images is 2048 × 1024 and 19 semantic categories are provided with pixel-level labels. We take the whole training set as the target domain data. The results of our transfer learning scheme are reported on the validation set.

  21. Experimental Results • Baseline methods are defined in the following: • 1) Direct Joint Training: Directly combine both synthetic and real-world data. • 2) Target Finetuning: Pretrain model with the synthetic data and then finetune using the real-world data. • 3) FCN+GAN: Only contains FCN segmentation part and GAN part. • 4) PixelDA[1]: An unsupervised domain adaptation method which is not compatible with our setting, we extend it to our problem by giving the label of both synthetic and real-world data. • 5) FCN+W1: FCN segmentation part + W1 part. [1] Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, and Dilip Krishnan. Unsupervised pixel- level domain adaptation with generative adversarial net- works. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

  22. Methodology Shared Weighting Map 19channels Multi-channel Weighting Map

  23. Experimental Results GTAV + CITYSCAPES → CITYSCAPES SYNTHIA + CITYSCAPES → CITYSCAPES GTAV + SYNTHIA + CITYSCAPES → CITYSCAPES.

  24. Ablation Studies single-level (pixel-level only)

  25. Visualization of Weighting Maps and Segmentation Results

  26. Visualization of Segmentation Results

  27. Analysis of Extremely Insufficient Data

  28. Conclusion • + Propose a new transfer learning method with both real and synthetic images for semantic segmentation. • + Adaptively selecting similar synthetic pixels for learning. • +Abundantexperiments. • - The results are not so exciting.

More Related