1 / 22

Lecture 19 Unsupervised and One-Shot Learning

This guy is wearing a haircut called a “Mullet”. Lecture 19 Unsupervised and One-Shot Learning. Gary Bradski and Sebastian Thrun. http://robots.stanford.edu/cs223b/index.html. Find the Mullets…. One-Shot Learning. One-Shot Learning.

Download Presentation

Lecture 19 Unsupervised and One-Shot Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This guy is wearing a haircut called a “Mullet” Lecture 19Unsupervised and One-Shot Learning Gary Bradski and Sebastian Thrun http://robots.stanford.edu/cs223b/index.html

  2. Find the Mullets… One-Shot Learning

  3. One-Shot Learning “The appearance of the categories we know and … the variability in their appearance, gives us important information on what to expect in a new category”1 Papers for this lecture: • L. Fei-Fei, R. Fergus and P. Perona, “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories” ICCV 03. • R. Fergus, P. Perona and A.Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning”, CVPR 03. • http://www.vision.caltech.edu/html-files/publications.html

  4. …But first, review: Problem: • You have atleast 8 points (say, found with SIFT features) that you’ve tracked between 2 frames of a moving camera. • What are their 3D coordinates (up to a scale factor) relative to the first frame’s coordinate system? Answer: • Trucco, Ch. 7 Section 7.3.3-7.3.5, 7.4.2

  5. P Pl Pr Yr p p r l Yl Zl Zr Xl fl fr Ol Or R, T Xr 2 Images • Notations • Pl =(Xl, Yl, Zl), Pr =(Xr, Yr, Zr) • Vectors of the same 3-D point P, in the left and right camera coordinate systems respectively • Extrinsic Parameters • Translation Vector T = (Or-Ol) • Rotation Matrix R • pl =(xl, yl, zl), pr =(xr, yr, zr) • Projections of P on the left and right image plane respectively • For all image points, we have zl=fl, zr=fr From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

  6. Fundamental Matrix • Mapping between points and epipolar lines in the pixel coordinate systems • With no prior knowledge on the stereo system • From Camera to Pixels: Matrices of intrinsic parameters • Parameters: • focal lengths x & y: fx, fy, • center of projection: ox, oy ? Rank (Mint) =3 Essential Matrix For one camera moving, Mr = Ml. From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

  7. Fundamental Matrix Essential Matrix • Fundamental Matrix • Rank (F) = 2 • Encodes info on both intrinsic and extrinsic parameters • Enables full reconstruction of the epipolar geometry • In pixel coordinate systems without any knowledge of the intrinsic and extrinsic parameters • Linear equation of the 9 entries of F From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

  8. Computing F: The Eight-point Algorithm • Input: n point correspondences ( n >= 8) • Construct homogeneous system Ax= 0 from • x = (f11,f12, ,f13, f21,f22,f23 f31,f32, f33) : entries in F • Each correspondence give one equation • A is a nx9 matrix • Obtain estimate F^ by SVD of A • x (up to a scale) is column of V corresponding to the least singular value • Enforce singularity constraint: since Rank (F) = 2 • Compute SVD of F^ • Set the smallest singular value to 0: D -> D’ • Correct estimate of F : • Output: the estimate of the fundamental matrix, F’ • Similarly we can compute E given intrinsic parameters From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

  9. Reconstruction up to a Scale Factor • Assumption and Problem Statement • Under the assumption that only intrinsic parameters and more than 8 point correspondences are given • Compute the 3-D location from their projections, pl and pr, as well as the extrinsic parameters • Solution • Compute the essential matrix E from at least 8 correspondences • Estimate T (up to a scale and a sign) from E (=RS) using the orthogonal constraint of R, and then R (see Trucco 7.4.2) • End up with four different estimates of the pair (T, R) • Reconstruct the depth of each point, and pick up the correct sign of R and T. • Results: reconstructed 3D points (up to a common scale); • The scale can be determined if distance of two points (in space) are known From: Zhigang Zhu, NAC 8/203A http://www-cs.engr.ccny.cuny.edu/~zhu/VisionCourse-I6716.html

  10. Visual learning is inefficient Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm

  11. Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm No wonder a huge amount of data is needed to train models… How do we get to more biological levels of performance?

  12. Use a Bayesian Framework Training set Appearance Shape Appearance Training set Shape set to 1.0

  13. Representation • Use a scale invariant, scale sensing feature keypoint detector (like the first steps of Lowe’s SIFT). From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

  14. Features Keys • A direct appearance model is taken around each located key. This is then normalized by it’s detected scale to an 11x11 window. PCA further reduces these features. From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

  15. From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

  16. Add Model Hyper-parameters What are hyper-parameters? Parameters that bias parameters. For instance if you wanted to learn the probability of a coin turning up heads or tails, it would be stupid to observe 1 “head” and conclude: “heads 100%, tails 0%. Instead, we use a bimodal distribution to draw our parameter beliefs from until we have enough data. Model Params Learn Then hype-params

  17. Learning • Assume that an object instance is the only • consistent thing somewhere in a scene. • We don’t know where to start, so we use • the initial random parameters. • (M) We find the best (consistent across images) assignment given the params. • (E) We refit the feature detector params. and repeat until converged. • Note that there isn’t much consistency • This repeats until it converges at the most consistent assignment with maximized parameters across images. • Fit with E-M (this example is a 3 part model) • We start with the dual problem of what to fit and where to fit it. From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

  18. Result: Unsupervised Learning Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm

  19. Recognition • Bayesian Decision based Feature detector results: The shape model. The mean location is indicated by the cross, with the ellipse showing the uncertainty in location. The number by each part is the probability of that part being present. From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/ Recognition Result: The appearance model closest to the mean of the appearance density of each part

  20. Data Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/Resume.htm 3 categories are trained extensively, the first is learned in 1-5 presentations. This is possible since E-M also trains the hyper-parameters which say what 3D models “look like”/where to look.

  21. Results • One-Shot results: • Compare to batch approaches: From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

  22. Using supervised classifiers for unsupervised learning. • Will discuss in class.

More Related