1 / 30

Visual Grounding 专题报告

Visual Grounding 专题报告. Lejian Ren 4.23. Tasks. Grounding P hrases. Plummer, Bryan A., et al. "Phrase localization and visual relationship detection with comprehensive image-language cues." Proceedings of the IEEE International Conference on Computer Vision. 2017. Tasks. G rounding Phrases

rwatkins
Download Presentation

Visual Grounding 专题报告

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Grounding 专题报告 Lejian Ren 4.23

  2. Tasks • Grounding Phrases Plummer, Bryan A., et al. "Phrase localization and visual relationship detection with comprehensive image-language cues." Proceedings of the IEEE International Conference on Computer Vision. 2017.

  3. Tasks • Grounding Phrases • Grounding Referring Expressions • 对称任务:Referring Expressions Generation(region captioning) • 基于检测 • 基于分割 • Grounding Referring Relationships

  4. Tasks • Grounding Phrases • Grounding Referring Expressions • 对称任务:Referring Expressions Generation(region captioning) • 基于检测 • 基于分割 • Grounding Referring Relationships

  5. Grounding • VQA中的attention Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. arXiv preprint arXiv:1601.01705 (2016)

  6. Grounding Referring Expression(BBOX) • Bag-of-words • Image-text-feature-matching • Scoring model • Generating model

  7. Grounding Referring Expression(BBOX) • Bag-of-words • Image-text-feature-matching • 大多隐式 • Scoring model • Generating model • 多用于分割

  8. Grounding Referring Expression(BBOX) • Challenges • require the localization of objects • multimodal com prehension of context • visual attributes (e.g., “largest”, “baby”) • relationships (e.g., “behind”) that help to distinguish the referent from other objects, especially those of the same category.

  9. Grounding Referring Expression(BBOX) • 将grounding问题视为 object retrieval S. Guadarrama, E. Rodner, K. Saenko, N. Zhang, R. Farrell, J. Donahue, and T. Darrell. Open-vocabulary object retrieval. In Robotics: Science and Systems, 2014.

  10. Grounding Referring Expression(BBOX) • 逐box预测与expression的匹配度 Hu, Ronghang, et al. "Natural language object retrieval." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

  11. Grounding Referring Expression(BBOX) J. Mao, J. Huang, A. Toshev, O. Camburu, A. Yuille, and K. Murphy. Generation and comprehension of unambiguous object descriptions. Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 2016

  12. Grounding Referring Expression(BBOX) Liu, J., Wang, L., & Yang, M. (2017). Referring Expression Generation and Comprehension via Attributes. 4866–4874.

  13. Grounding Referring Expression(BBOX) • Image caption和grounding在特征层面是可以复用的 Shridhar, Mohit, and David Hsu. "Grounding spatio-semantic referring expressions for human-robot interaction." arXiv preprint arXiv:1707.05720 (2017).

  14. Grounding Referring Expression(BBOX) • 显式地关注【相关】物体 Shridhar, Mohit, and David Hsu. "Grounding spatio-semantic referring expressions for human-robot interaction." arXiv preprint arXiv:1707.05720 (2017).

  15. Grounding Referring Expression(BBOX) • 关注context Zhang, H., Niu, Y., & Chang, S. F. (2018). Grounding Referring Expressions in Images by Variational Context. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4158–4166.

  16. Grounding Referring Expression(BBOX) • 强调relationship Yu, Licheng, et al. "Mattnet: Modular attention network for referring expression comprehension." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

  17. Datasets(BBox) • RefCOCO • 基于MSCOCO • RefCOCO+ • 去除位置信息 • RefCOCOg • 采集方式不同,句子更长 • RefCLEF

  18. Grounding Referring Expression(Seg) • 主要是生成模型 • 围绕怎么结合图像和文本 • 整个文本作为特征 • 每个单词作为特征

  19. Grounding Referring Expression(Seg) Hu, R., Rohrbach, M., & Darrell, T. (2016). Segmentation from natural language expressions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS(d), 108–124.

  20. Grounding Referring Expression(Seg) Hu, R., Rohrbach, M., & Darrell, T. (2016). Segmentation from natural language expressions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS(d), 108–124.

  21. Grounding Referring Expression(Seg) Li, R., Kuo, Y., Shu, M., & Qi, X. (2016). Referring Image Segmentation via Recurrent Refinement Networks Supplementary Material.

  22. Grounding Referring Expression(Seg) Li, R., Kuo, Y., Shu, M., & Qi, X. (2016). Referring Image Segmentation via Recurrent Refinement Networks Supplementary Material.

  23. Grounding Referring Expression(Seg) • 整个句子可能有信息损失 • 逐个单词 Liu, C., Lin, Z., Shen, X., Yang, J., & Lu, X. (n.d.). Recurrent Multimodal Interaction for Referring Image Segmentation. 1271–1280.

  24. Grounding Referring Expression(Seg) Margffoy-Tuay, Edgar, et al. "Dynamic multimodal instance segmentation guided by natural language queries." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

  25. Grounding Referring Expression(Seg) • 逐个单词可能无法关注到全局信息 Ye, L. (n.d.). Cross-Modal Self-Attention Network for Referring Image Segmentation.

  26. Grounding Referring Expression(Seg+Video) • 解决时序上的inconsistent • 用时序信息(overlap)re-rank Khoreva, A., Rohrbach, A., & Schiele, B. (n.d.). Video Object Segmentation with Language Referring Expressions.

  27. Datasets (Seg) • ReferIt • UNC • 基于MSCOCO • UNC+ • 去除位置信息 • G-Ref • 基于COCO,采集方式不同

  28. Grounding Referring Relationship Krishna, Ranjay, et al. "Referring relationships." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

  29. Grounding Referring Relationship Krishna, Ranjay, et al. "Referring relationships." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

  30. Grounding Referring Relationship Krishna, Ranjay, et al. "Referring relationships." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

More Related