cs 764 seminar in computer vision n.
Skip this Video
Loading SlideShow in 5 Seconds..
CS 764 Seminar in Computer Vision PowerPoint Presentation
Download Presentation
CS 764 Seminar in Computer Vision

Loading in 2 Seconds...

play fullscreen
1 / 17

CS 764 Seminar in Computer Vision - PowerPoint PPT Presentation

  • Uploaded on

CS 764 Seminar in Computer Vision. Ramin Zabih. Fall 1998. Course mechanics. Meeting time will be Tue/Thu 11-12, here Starting a week from today Home page is now up www/CS764 Assignment: present one paper You’ll have a lot of freedom, but you need to talk to me in advance

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CS 764 Seminar in Computer Vision' - april-casey

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
course mechanics
Course mechanics
  • Meeting time will be Tue/Thu 11-12, here
    • Starting a week from today
  • Home page is now up


  • Assignment: present one paper
    • You’ll have a lot of freedom, but you need to talk to me in advance
    • Some possible papers will be posted shortly
topic of this seminar
Topic of this seminar
  • The use of “knowledge” in the analysis of visual data
    • Sometimes called “context”
  • Clearly this is vital
    • On both psychological and technical grounds
    • But how? No one has much of an idea…
  • What is the interface between reasoning and perception? (Or, mind and body?)
what is the visual system s contract
What is the visual system’s “contract”
  • Two standard (bad) answers
  • Answer 1: describe the scene in terms of surfaces [low-level vision]
    • There is a green patch 2” wide 1’ away
  • Answer 2: describe the scene in terms of objects [model-based recognition]
    • Start with a set of 3D models (modelbase)
    • Determine position and pose
why are these answers wrong
Why are these answers wrong?
  • They are almost purely data-driven
    • Bottom-up (from the data) versus top-down (from somewhere else)
  • They report “objective fact”, with no room for the task at hand
    • For a given image, there is only one right answer
  • Other problems as well
    • Not very useful, etc.
technical and psychological arguments
Technical and psychological arguments
  • There are technical arguments against this
    • Vision is an inverse problem
      • Many 3D scenes could explain a single 2D image
    • On engineering grounds, this makes no sense
      • Ultimately, perception is used for some task
  • The human perceptual system has both top-down and bottom-up elements
    • Various optical illusions
      • Two people can look at the same picture and see something completely different
low level vision has its solution
Low-level vision has its solution
  • Inverse problems require assumptions
  • The assumptions for low-level vision are extremely general (I.e., weak)
    • Reflect the physics of the visible world
    • For example, motion or depth or intensity tend to be “coherent”
      • Saying that every pixel is moving differently from its neighbors is a very unlikely answer
      • The world we live in tends not to do that
      • Helmholtz’s “unconscious inference”
we ll need high level vision
We’ll need high-level vision
  • Most of the field is low-level vision or model-based recognition
    • Partly to avoid the confusion CS764 is about
  • Key question: how to avoid brittleness?
    • Can make the visual system compute just what we need for our task (I.e., berries)
    • But how to handle the unexpected (I.e., lions)?
a short historical perspective
A short historical perspective
  • 1960’s vision was completely task-specific
    • A black blob in the center of the image is a telephone
    • These efforts are now considered “hacks”
  • 1970’s vision became completely general
    • Marr pushed the field towards precise technical questions
    • Low-level vision and recognition became dominant
tasks strike back
Tasks strike back
  • In the mid-1980’s, several attempts were made to re-introduce a notion of task
    • Active/animate/purposive vision
  • These attempts are widely viewed as failures, for good reasons
    • We’ll look at them a bit next week
  • It’s not enough to have good intuitions
    • There needs to be technical merit as well
  • Technical solutions (algorithms) that are very roughly consistent with human data
    • Goal is not AI, psychology or philosophy
  • Provide visual summaries useful for tasks, but degrade gracefully
    • Handle open/unstructured environments
    • Deal with expectations and breakdown
our path for 764
Our path for 764
  • No good computational work to read
    • Perhaps Vera will fix this?
  • We will examine papers along these lines:
    • Computational approaches that failed
    • Psychological data that is highly suggestive
    • Neurologically inspired architectures
    • Cognitive scientists and philosophers
      • Their goal is argument, not algorithm!
      • They’ve thought the most about these issues