Epitomic location recognition
Download
1 / 35

Epitomic Location Recognition - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

K. Ni, A. Kannan , A. Criminisi and J. Winn. Epitomic Location Recognition. A g enerative approach for location recognition. In proc. CVPR 2008. Anchorage, Alaska. Goal Introduction Recognition Enhancements Evaluation. Location Recognition. Where am I? Instance recognition

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Epitomic Location Recognition' - kylan-rodgers


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Epitomic location recognition

K. Ni, A. Kannan, A. Criminisi and J. Winn

Epitomic Location Recognition

A generative approach for location recognition

In proc. CVPR 2008. Anchorage, Alaska.


Goal

Introduction

Recognition

Enhancements

Evaluation


Location recognition
Location Recognition

  • Where am I?

    • Instance recognition

    • Category recognition (more difficult)

Lobby? Cubicle? Hallway? Kitchen?


Goal

Introduction

Recognition

Enhancements

Evaluation


Geometry based recognition
Geometry Based Recognition

  • SLAM & structure from motion

    • Why do we need metric reconstruction?

    • Lose the flexibility to do class recognition.

Training Images

Local Feature Database

Geometry &Labels

Testing Image

Features

F. Schaffalitzky and A. Zisserman

G. Schindler, M. Brown, R. Szeliski


Appearance based recognition
Appearance Based Recognition

  • Capture global appearance information

    • Gaussian mixture model used by A. Torralba, et. al

Preprocessing

Image

Vectors

Training

Training Images

Appearance Model

(e.g. PCA)

A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin

M. Cummins and P. Newman


Appearance or geometry
Appearance or Geometry?

  • Can we do better by fusing both information together?

A small example with 2 location labels: cubicle and corridor


The simplest model
The Simplest Model

  • Nearest neighbor classification

    • Naive but still effective with enough samples.

    • A small shift may disrupt the recognition.

    • Does not capture uncertainty.


How to incorporate translation invariance
How to Incorporate Translation Invariance?

  • We need something better than a “bag of frames” model

Training

images

Testing image


Panorama
Panorama

  • It models both appearance & geometry

    • Adapts to camera rotation and focal length change

  • Generative

    • An image is a patch “extracted” from the panorama

M. Brown and D. G. Lowe


Cons of panoramas
Cons of Panoramas

  • Not easy to build a panorama due to parallax

  • Do not capture uncertainty

  • Only work for location instance recognition

  • No compact representation for repetitive scenes


Gaussian mixture model
Gaussian Mixture Model

  • Six mixtures trained as in Torralba et al’s paper

    • Handles uncertainties but no translation invariance

Remove boundaries

Much more blurred

Means

Variances


A weak panorama
A Weak Panorama

  • 3D motions can be roughly modeled by 2D translation + scaling.

2D translation

Scaling


Epitome panorama gmm
Epitome = Panorama + GMM

  • Epitome

    • Generative model for image patches /video frames

    • Captures repetitive patterns in the original image

    • Mapping = 2D translation + scaling

Epitome

A source image

Image patches

N. Jojic et.al., ICCV 2003; N. Petrovic, et.al., CVPR 2006


Epitome as probabilistic panorama
Epitome as Probabilistic Panorama

  • Model 3D scenes rather than a single 2D image

Location Epitome

Means

Variances

Environment = Virtual panorama


Learning the location epitome
Learning the Location Epitome

  • Initialize epitome randomly

  • EM Iterations

    • E-step: infer the posteriors over all mappings

    • M-step: use the posteriors as weights to update the mean and variance of epitome pixels

Free energy

EM iterations


Model comparison
Model Comparison

  • Epitome is a smart mixture of Gaussians model with parameters sharing among components

    • For the same number of parameters, the epitome generalizes better


Goal

Introduction

Recognition

Enhancements

Evaluation


Build label map s
Build Label Maps

  • The label maps are the posterior of the label given the mapping

Cubicle label map

Corridor label map

Epitome

Label maps


Recognition from location epitomes
Recognition from Location Epitomes

  • Fast correlation: infer the best mapping region

  • Sum the pixel-wise votes

  • Temporal smoothing using HMM

Best matching patch

Input testing image

Cubicle label map

Location epitome

Corridor label map


Goal

Introduction

Recognition

Enhancements

Evaluation


Color is not always the best feature
Color is not always the best feature

  • Other features besides RGB

    • For example, stereo feature captures the depth info.

    • Do not need high stereo accuracy (efficient DP here)

Corridor

Cubicle

Kitchen


Integrating multiple features
Integrating Multiple Features

  • Stack multiple feature “channels”

Stereo

R

G

B


Local histograms
Local Histograms

  • Enable better translation invariance and more generalization

    • Error rate: 0.49  0.36 in a test, 4-class dataset

  • Improve the efficiency dramatically: 30 times speed-up


Supervised learning
Supervised Learning

  • Incorporates training image labels

  • Helps discriminate images with similar features but different location labels.

A microwave in the kitchen

An example epitome

A monitor in the cubicle

Discriminative features

An example label feature


Goal

Introduction

Recognition

Enhancements

Evaluation


Mit image database
MIT Image Database

  • Created by Antonio Torralba, and et. al.

    • 17 sequences, 62 locations, 7 categories, 72077 images


Results on recognizing location instances
Results on Recognizing Location Instances

  • Location epitome vs. GMM, 10% better in average


Results on recognizing location classes
Results on Recognizing Location Classes

  • Location Epitome vs. GMM, 10%-20% better


Msrc data set
MSRC Data Set

  • Captured with a stereo camera

    • 5409 images collected at the speed of 4 fps

    • 11 sequences and 7 classes

corridor_visionlab

cubicle_mlp

kitchen-fl2-north

lectureroom-large

lectureroom-small

stairs-1st-to-2nd

stairs-2nd-to-1st


Integrate depth cues
Integrate Depth Cues

corridor_visionlab

cubicle_mlp

kitchen-fl2-north

lectureroom-large

lectureroom-small

stairs-1st-to-2nd

stairs-2nd-to-1st


Instance recognition with multiple features
Instance Recognition with Multiple Features

  • RGB & Stereo overwhelms the other features

  • Learning: 5.7 fps

  • Recognition: 116 fps = 29 times the capture speed


Summary
Summary

  • A generative model for the recognition of both location instances and classes

    • Fast: capable of real-time applications

    • Flexible: capable of integrating various features

    • Probabilistic: capable of capturing uncertainties

  • Future applications

    • Navigation for visually impaired people

    • Appearance-based loop closing for SLAM problems


Epitomic location recognition1

K. Ni, A. Kannan, A. Criminisi and J. Winn

Epitomic Location Recognition

Thank you !

A generative approach for location recognition


Local histograms 2
Local Histograms (2)

  • Improves efficiency (both training and testing)

    • The bottle neck: convoluting epitome and images

    • Compression rate: 3*(C1C2)2/50 = 2400

  • Learning: 3 hours  6 mins, 30 times faster

Ne/C2

N/C2

Me/C1

M/C1

N

Ne

Epitome

Image

Me

M

*

*

Convolute 3-dimension RGB features

Convolute 50-dimension local histograms


ad