lecture 19 unsupervised and one shot learning n.
Skip this Video
Download Presentation
Lecture 19 Unsupervised and One-Shot Learning

Loading in 2 Seconds...

play fullscreen
1 / 22

Lecture 19 Unsupervised and One-Shot Learning - PowerPoint PPT Presentation

  • Uploaded on

This guy is wearing a haircut called a “Mullet”. Lecture 19 Unsupervised and One-Shot Learning. Gary Bradski and Sebastian Thrun. http://robots.stanford.edu/cs223b/index.html. Find the Mullets…. One-Shot Learning. One-Shot Learning.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Lecture 19 Unsupervised and One-Shot Learning' - carol-shannon

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 19 unsupervised and one shot learning

This guy is wearing a haircut

called a “Mullet”

Lecture 19Unsupervised and One-Shot Learning

Gary Bradski and Sebastian Thrun


find the mullets
Find the Mullets…

One-Shot Learning

one shot learning
One-Shot Learning

“The appearance of the categories we know and … the variability in their appearance, gives us important information on what to expect in a new category”1

Papers for this lecture:

  • L. Fei-Fei, R. Fergus and P. Perona, “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories” ICCV 03.
  • R. Fergus, P. Perona and A.Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning”, CVPR 03.
  • http://www.vision.caltech.edu/html-files/publications.html
but first review
…But first, review:


  • You have atleast 8 points (say, found with SIFT features) that you’ve tracked between 2 frames of a moving camera.
  • What are their 3D coordinates (up to a scale factor) relative to the first frame’s coordinate system?


    • Trucco, Ch. 7 Section 7.3.3-7.3.5, 7.4.2
2 images

















R, T


2 Images
  • Notations
    • Pl =(Xl, Yl, Zl), Pr =(Xr, Yr, Zr)
      • Vectors of the same 3-D point P, in the left and right camera coordinate systems respectively
    • Extrinsic Parameters
      • Translation Vector T = (Or-Ol)
      • Rotation Matrix R
    • pl =(xl, yl, zl), pr =(xr, yr, zr)
      • Projections of P on the left and right image plane respectively
      • For all image points, we have zl=fl, zr=fr

From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

fundamental matrix
Fundamental Matrix
  • Mapping between points and epipolar lines in the pixel coordinate systems
    • With no prior knowledge on the stereo system
  • From Camera to Pixels: Matrices of intrinsic parameters
  • Parameters:
    • focal lengths x & y: fx, fy,
    • center of projection: ox, oy ?

Rank (Mint) =3

Essential Matrix

For one camera moving, Mr = Ml.

From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

fundamental matrix1
Fundamental Matrix

Essential Matrix

  • Fundamental Matrix
    • Rank (F) = 2
    • Encodes info on both intrinsic and extrinsic parameters
    • Enables full reconstruction of the epipolar geometry
    • In pixel coordinate systems without any knowledge of the intrinsic and extrinsic parameters
    • Linear equation of the 9 entries of F

From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

computing f the eight point algorithm
Computing F: The Eight-point Algorithm
  • Input: n point correspondences ( n >= 8)
    • Construct homogeneous system Ax= 0 from
      • x = (f11,f12, ,f13, f21,f22,f23 f31,f32, f33) : entries in F
      • Each correspondence give one equation
      • A is a nx9 matrix
    • Obtain estimate F^ by SVD of A
      • x (up to a scale) is column of V corresponding to the least singular value
    • Enforce singularity constraint: since Rank (F) = 2
      • Compute SVD of F^
      • Set the smallest singular value to 0: D -> D’
      • Correct estimate of F :
  • Output: the estimate of the fundamental matrix, F’
  • Similarly we can compute E given intrinsic parameters

From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

reconstruction up to a scale factor
Reconstruction up to a Scale Factor
  • Assumption and Problem Statement
    • Under the assumption that only intrinsic parameters and more than 8 point correspondences are given
    • Compute the 3-D location from their projections, pl and pr, as well as the extrinsic parameters
  • Solution
    • Compute the essential matrix E from at least 8 correspondences
    • Estimate T (up to a scale and a sign) from E (=RS) using the orthogonal constraint of R, and then R (see Trucco 7.4.2)
      • End up with four different estimates of the pair (T, R)
    • Reconstruct the depth of each point, and pick up the correct sign of R and T.
    • Results: reconstructed 3D points (up to a common scale);
    • The scale can be determined if distance of two points (in space) are known

From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

visual learning is inefficient
Visual learning is inefficient

Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm


Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm

No wonder a huge amount of data is needed to train models…

How do we get to more biological levels of performance?

use a bayesian framework
Use a Bayesian Framework

Training set Appearance



Training set Shape

set to 1.0

  • Use a scale invariant, scale sensing feature keypoint detector (like the first steps of Lowe’s SIFT).

From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

features keys
Features Keys
  • A direct appearance model is taken around each located key. This is then normalized by it’s detected scale to an 11x11 window. PCA further reduces these features.

From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

add model hyper parameters
Add Model Hyper-parameters

What are hyper-parameters? Parameters that bias parameters. For instance

if you wanted to learn the probability of a coin turning up heads or tails, it would

be stupid to observe 1 “head” and conclude: “heads 100%, tails 0%. Instead,

we use a bimodal distribution to draw our parameter beliefs from until we have

enough data.

Model Params




  • Assume that an object instance is the only
  • consistent thing somewhere in a scene.
  • We don’t know where to start, so we use
  • the initial random parameters.
  • (M) We find the best (consistent across images) assignment given the params.
  • (E) We refit the feature detector params. and repeat until converged.
    • Note that there isn’t much consistency
  • This repeats until it converges at the most consistent assignment with maximized parameters across images.
  • Fit with E-M (this example is a 3 part model)
  • We start with the dual problem of what to fit and where to fit it.

From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

result unsupervised learning
Result: Unsupervised Learning

Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm

  • Bayesian Decision based

Feature detector results:

The shape model. The mean location is indicated by the cross, with the ellipse showing the uncertainty in location. The number by each part is the probability of that part being present.

From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

Recognition Result:

The appearance model closest to the mean of the appearance density of each part


Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm

3 categories are trained extensively, the first is learned in 1-5 presentations. This

is possible since E-M also trains the hyper-parameters which say what 3D models

“look like”/where to look.

  • One-Shot results:
  • Compare to batch approaches:

From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/