1 / 52

CS 8803 CVL: Vision and Language

CS 8803 CVL: Vision and Language. Devi Parikh School of Interactive Computing. Welcome!. Plan for today. Topic overview Introductions Course overview: Logistics Requirements Lecture format Please interrupt at any time with questions or comments. Computer Vision.

Download Presentation

CS 8803 CVL: Vision and Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 8803 CVL: Vision and Language Devi Parikh School of Interactive Computing

  2. Welcome!

  3. Plan for today • Topic overview • Introductions • Course overview: • Logistics • Requirements • Lecture format • Please interrupt at any time with questions or comments

  4. Computer Vision Automatic understanding of images and video Computing properties of the 3D world from visual data (measurement) Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation) Algorithms to mine, search, and interact with visual data (search and organization) Kristen Grauman

  5. What does recognition involve? Fei-Fei Li

  6. Detection: Are there people?

  7. Activity: What are they doing?

  8. Object categorization mountain tree building banner street lamp vendor people

  9. Instance recognition Potala Palace A particular sign

  10. Scene and context categorization • outdoor • city • …

  11. Attribute recognition gray made of fabric crowded flat

  12. People coloring a street on a college campus

  13. It was a great event! It brought families out, and the whole community together.

  14. Q. What are they coloring the street with? A. Chalk

  15. AI: What a nice picture! What event was this? User:“Color College Avenue”. It was a lot of fun! AI: I am sure it was! Do they do this every year? User:I wish they would. I don’t think they’ve organized it again since 2012. …

  16. Why Words and Pictures? 1 Pictures are everywhere Words are how we communicate

  17. Why Words and Pictures? 1 Applications

  18. Why Words and Pictures? 1 Applications Interact with, organize, and navigate visual data

  19. Why Words and Pictures? 1 Applications Leverage multi-modal information on the web

  20. Why Words and Pictures? 1 Applications Aid visually-impaired users Microsoft

  21. Why Words and Pictures? 1 Applications Aid visually-impaired users

  22. Why Words and Pictures? 1 Applications Summarize visual data for analysts

  23. Why Words and Pictures? 2 • Measuring and demonstrating AI capabilities • Image understanding • Language understanding

  24. Why Words and Pictures? 3 • Beyond “bucket” recognition • Language is compositional “A steam engine is coming out of a fireplace.” René Magritte (1938)

  25. Why Words and Pictures? 4 “Vision is our best sensor, and language is our best invention.” -- Viraj Prabhu

  26. My goals (for you) • Be well-versed in the latest in vision + language • Critique research papers in vision + language • Identify interesting open questions and applications • Execute a research project in vision + language

  27. Introductions • Devi Parikh • Ph.D., Carnegie Mellon University, 2009 • Research Assistant Professor, TTI-Chicago, 2013 • Assistant Professor, ECE, Virginia Tech, 2016 • Assistant Professor, School of Interactive Computing, Georgia Tech (currently) • Research Scientist, Facebook AI Research (currently)

  28. Introductions • Arjun Chandrasekaran (your TA) • CS Ph.D. Student • Georgia Tech • CV, ML, NLP, AI • language and vision • making human-AI interaction more natural and efficient

  29. Introductions • Larry He (your second TA) • CS MS Student • Georgia Tech

  30. Introductions • Which program are you in? • How far along? • Have you taken a computer vision course before? • Have you taken a machine learning course before? • Do you know how CNNs and LSTMs work? • Have you used a deep learning package before? • What are you hoping to get out of this class?

  31. This course CS 8803 CVL Klaus 2456, TR 1:30 pm to 2:45 pm Course webpage: http://www.prism.gatech.edu/~arjun9/CS8803_CVL_Fall17/ Piazza: https://piazza.com/gatech/fall2017/cs8803cvl/home Focus on topics at the intersection of vision and language Cutting edge research

  32. Requirements Paper reviews each class [30%] Leading discussion(~once) on papers [10%] Project [60%] No “Assignments”, Exams, etc.

  33. Prerequisites Course in computer vision Course in machine learning Basic knowledge of deep learning

  34. Paper reviews For each class Review one paper Submit by midnight before class Submission workflow: TBD Skip reviews the class you are leading discussion Late reviews will not be accepted Will drop three lowest grades on reviews

  35. Paper review guidelines One page Detailed review: Brief (2-3 sentences) summary Main contribution Strengths? Weaknesses? How convincing are the experiments? Suggestions to improve them? Extensions? Applications? Additional comments, unclear points Relationships observed between the papers we are reading Pull out most interesting thought Look at class webpage Write in your own words Write well, proof read

  36. Leading Discussion ~ One of you will be assigned to argue for the paper ~ One of you will be assigned to argue against the paper Come prepared with 5 points Sign up here by August 29th: https://docs.google.com/spreadsheets/d/1E0uBxZ5gyKRzsrz2RJP9WTgCV5TYEby7EYTjvCwK-WM/edit?usp=sharing

  37. Projects First few lectures: introductory talks Image captioning Visual question answering Visual dialog By lead authors of representative works in this space

  38. Projects Possibilities: Design and evaluate a novel approach A novel application, use case Extension of a technique studied in class Be creative! Think: research paper at a good conference Work in teams of ~4 (at most 15 teams in the class) Sign up for teams by September 8thhttps://docs.google.com/spreadsheets/d/1n0aP3k7BwguFS0BNt5aUfMpo7-JPiHqQQ2K258YJPh0/edit?usp=sharing

  39. Project timeline Four in-class presentations (see class schedule) Project ideas / proposal [10%] Update 1 [10%] Update 2 [10%] Final presentation [15%] Project video (1 minute) [15%] December 5th

  40. Tips Make sure you are saying everything we need to know to understand what you are saying. Make sure you know what you are talking about. Think about your audience. Make your talks visual, animated (images, video, not lots of text). Stick to the time limit!

  41. Tips Clearly define the problem statement (input, output) Place your work in the context of existing work you know of Lay out the set of experiments you’ll conduct to demonstrate the efficacy of your approach Present a timeline Concrete goals for next update in ~2.5 weeks Long shots Present updates along this plan See more details on class webpage Stick to the time limit!

  42. Implementation Use any language / platform / package you like No support for code / implementation issues will be provided Possibility of consulting with lead authors who gave the introductory talks

  43. Miscellaneous Best presentation, best project and best discussion prizes! We will vote Feedback welcome and useful

  44. Context • Deep Learning (CS 7643) • This course is complementary to it

  45. Coming up • Read the class webpage • Schedule is up • Select 6 dates (topics) you would like to lead the discussion on (by August 29th) • Sign up sheet shows how many people have already signed up for a topic • Select those that have fewer selections • Probability of dropping class? • Start thinking about project teams • Pointers to good presentations, reviews, etc. are on the class webpage.

  46. Moving forward • No class on Thursday • Three lectures after that • No paper reading, no review, no discussion • Introductory talks covering spectrum of vision + language tasks

  47. Each lecture after that • You will have read and summarized a paper the night before • ~ 15 minute discussion on paper we read • Led by two students: “for” and “against” • 10-minute presentation by 3 teams on projects • 10-minute discussion on each presentation

  48. Last two lectures • Final project presentations

More Related