Cs 764 seminar in computer vision
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

CS 764 Seminar in Computer Vision PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

CS 764 Seminar in Computer Vision. Ramin Zabih. Fall 1998. Course mechanics. Meeting time will be Tue/Thu 11-12, here Starting a week from today Home page is now up www/CS764 Assignment: present one paper You’ll have a lot of freedom, but you need to talk to me in advance

Download Presentation

CS 764 Seminar in Computer Vision

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs 764 seminar in computer vision

CS 764Seminar in Computer Vision

Ramin Zabih

Fall 1998


Course mechanics

Course mechanics

  • Meeting time will be Tue/Thu 11-12, here

    • Starting a week from today

  • Home page is now up

    www/CS764

  • Assignment: present one paper

    • You’ll have a lot of freedom, but you need to talk to me in advance

    • Some possible papers will be posted shortly


Topic of this seminar

Topic of this seminar

  • The use of “knowledge” in the analysis of visual data

    • Sometimes called “context”

  • Clearly this is vital

    • On both psychological and technical grounds

    • But how? No one has much of an idea…

  • What is the interface between reasoning and perception? (Or, mind and body?)


What is the visual system s contract

What is the visual system’s “contract”

  • Two standard (bad) answers

  • Answer 1: describe the scene in terms of surfaces [low-level vision]

    • There is a green patch 2” wide 1’ away

  • Answer 2: describe the scene in terms of objects [model-based recognition]

    • Start with a set of 3D models (modelbase)

    • Determine position and pose


Why are these answers wrong

Why are these answers wrong?

  • They are almost purely data-driven

    • Bottom-up (from the data) versus top-down (from somewhere else)

  • They report “objective fact”, with no room for the task at hand

    • For a given image, there is only one right answer

  • Other problems as well

    • Not very useful, etc.


Technical and psychological arguments

Technical and psychological arguments

  • There are technical arguments against this

    • Vision is an inverse problem

      • Many 3D scenes could explain a single 2D image

    • On engineering grounds, this makes no sense

      • Ultimately, perception is used for some task

  • The human perceptual system has both top-down and bottom-up elements

    • Various optical illusions

      • Two people can look at the same picture and see something completely different


Your vision system doesn t listen

Your vision system doesn’t listen


It makes reasonable assumptions

It makes “reasonable” assumptions


Low level vision has its solution

Low-level vision has its solution

  • Inverse problems require assumptions

  • The assumptions for low-level vision are extremely general (I.e., weak)

    • Reflect the physics of the visible world

    • For example, motion or depth or intensity tend to be “coherent”

      • Saying that every pixel is moving differently from its neighbors is a very unlikely answer

      • The world we live in tends not to do that

      • Helmholtz’s “unconscious inference”


We ll need high level vision

We’ll need high-level vision

  • Most of the field is low-level vision or model-based recognition

    • Partly to avoid the confusion CS764 is about

  • Key question: how to avoid brittleness?

    • Can make the visual system compute just what we need for our task (I.e., berries)

    • But how to handle the unexpected (I.e., lions)?


A short historical perspective

A short historical perspective

  • 1960’s vision was completely task-specific

    • A black blob in the center of the image is a telephone

    • These efforts are now considered “hacks”

  • 1970’s vision became completely general

    • Marr pushed the field towards precise technical questions

    • Low-level vision and recognition became dominant


Tasks strike back

Tasks strike back

  • In the mid-1980’s, several attempts were made to re-introduce a notion of task

    • Active/animate/purposive vision

  • These attempts are widely viewed as failures, for good reasons

    • We’ll look at them a bit next week

  • It’s not enough to have good intuitions

    • There needs to be technical merit as well


Desiderata

Desiderata

  • Technical solutions (algorithms) that are very roughly consistent with human data

    • Goal is not AI, psychology or philosophy

  • Provide visual summaries useful for tasks, but degrade gracefully

    • Handle open/unstructured environments

    • Deal with expectations and breakdown


Our path for 764

Our path for 764

  • No good computational work to read

    • Perhaps Vera will fix this?

  • We will examine papers along these lines:

    • Computational approaches that failed

    • Psychological data that is highly suggestive

    • Neurologically inspired architectures

    • Cognitive scientists and philosophers

      • Their goal is argument, not algorithm!

      • They’ve thought the most about these issues


  • Login