Loading in 5 sec....

O BJ C UT & Pose Cut CVPR 05 ECCV 06PowerPoint Presentation

O BJ C UT & Pose Cut CVPR 05 ECCV 06

- 95 Views
- Uploaded on
- Presentation posted in: General

O BJ C UT & Pose Cut CVPR 05 ECCV 06

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

UNIVERSITY

OF

OXFORD

OBJ CUT & Pose CutCVPR 05ECCV 06

Philip Torr

M. Pawan Kumar, Pushmeet Kohli and Andrew Zisserman

- Combining pose inference and segmentation worth investigating. (tommorrow)
- Tracking = Detection
- Detection = Segmentation
- Tracking (pose estimation) = Segmentation.

First segmentation problem

- To distinguish cow and horse?

- Given an image, to segment the object

Object

Category

Model

Segmentation

Cow Image

Segmented Cow

- Segmentation should (ideally) be
- shaped like the object e.g. cow-like
- obtained efficiently in an unsupervised manner
- able to handle self-occlusion

Intra-Class Shape Variability

Intra-Class Appearance Variability

Self Occlusion

Magic Wand

- Current methods require user intervention
- Object and background seed pixels (Boykov and Jolly, ICCV 01)
- Bounding Box of object (Rother et al. SIGGRAPH 04)

Object Seed Pixels

Cow Image

Magic Wand

- Current methods require user intervention
- Object and background seed pixels (Boykov and Jolly, ICCV 01)
- Bounding Box of object (Rother et al. SIGGRAPH 04)

Object Seed Pixels

Background Seed Pixels

Cow Image

Magic Wand

- Current methods require user intervention
- Object and background seed pixels (Boykov and Jolly, ICCV 01)
- Bounding Box of object (Rother et al. SIGGRAPH 04)

Segmented Image

Magic Wand

- Current methods require user intervention
- Object and background seed pixels (Boykov and Jolly, ICCV 01)
- Bounding Box of object (Rother et al. SIGGRAPH 04)

Object Seed Pixels

Background Seed Pixels

Cow Image

Magic Wand

- Current methods require user intervention
- Object and background seed pixels (Boykov and Jolly, ICCV 01)
- Bounding Box of object (Rother et al. SIGGRAPH 04)

Segmented Image

- Problem
- Manually intensive
- Segmentation is not guaranteed to be ‘object-like’

Non Object-like Segmentation

- Combine object detection with segmentation
- Borenstein and Ullman, ECCV ’02
- Leibe and Schiele, BMVC ’03

- Incorporate global shape priors in MRF
- Detection provides
- Object Localization
- Global shape priors

- Automatically segments the object
- Note our method completely generic
- Applicable to any object category model

- Problem Formulation
- Form of Shape Prior
- Optimization
- Results

- Labelling m over the set of pixels D
- Shape prior provided by parameter Θ
- Energy E (m,Θ) = ∑Φx(D|mx)+Φx(mx|Θ) + ∑ Ψxy(mx,my)+ Φ(D|mx,my)
- Unary terms
- Likelihood based on colour
- Unary potential based on distance from Θ

- Pairwise terms
- Prior
- Contrast term

- Find best labelling m* = arg min ∑ wi E (m,Θi)
- wi is the weight for sample Θi

Unary terms

Pairwise terms

- Probability for a labellingconsists of
- Likelihood
- Unary potential based on colour of pixel

- Prior which favours same labels for neighbours (pairwise potentials)

mx

m(labels)

Prior Ψxy(mx,my)

my

Unary Potential Φx(D|mx)

x

y

D(pixels)

Image Plane

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Φx(D|obj)

x

…

x

…

Φx(D|bkg)

Ψxy(mx,my)

y

…

y

…

…

…

…

…

Prior

Likelihood Ratio (Colour)

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Prior

Likelihood Ratio (Colour)

- Probability of labelling in addition has
- Contrast term which favours boundaries to lie on image edges

mx

m(labels)

my

x

Contrast Term

Φ(D|mx,my)

y

D(pixels)

Image Plane

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Φx(D|obj)

x

…

x

…

Φx(D|bkg)

Ψxy(mx,my)+

Φ(D|mx,my)

y

…

y

…

…

…

…

…

Prior + Contrast

Likelihood Ratio (Colour)

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Prior + Contrast

Likelihood Ratio (Colour)

- Probability of labelling in addition has
- Unary potential which depend on distance from Θ (shape parameter)

Θ (shape parameter)

Unary Potential

Φx(mx|Θ)

mx

m(labels)

my

Object Category

Specific MRF

x

y

D(pixels)

Image Plane

Cow Image

Object Seed

Pixels

Background Seed

Pixels

ShapePriorΘ

Prior + Contrast

Distance from Θ

Cow Image

Object Seed

Pixels

Background Seed

Pixels

ShapePriorΘ

Prior + Contrast

Likelihood + Distance from Θ

Cow Image

Object Seed

Pixels

Background Seed

Pixels

ShapePriorΘ

Prior + Contrast

Likelihood + Distance from Θ

- Problem Formulation
- E (m,Θ) = ∑Φx(D|mx)+Φx(mx|Θ) + ∑ Ψxy(mx,my)+ Φ(D|mx,my)

- Form of Shape Prior
- Optimization
- Results

- BMVC 2004

- Generative model
- Composition of parts + spatial layout

Layer 2

Spatial Layout

(Pairwise Configuration)

Layer 1

Parts in Layer 2 can occlude parts in Layer 1

Cow Instance

Layer 2

Transformations

Θ1

P(Θ1) = 0.9

Layer 1

Cow Instance

Layer 2

Transformations

Θ2

P(Θ2) = 0.8

Layer 1

Unlikely Instance

Layer 2

Transformations

Θ3

P(Θ3) = 0.01

Layer 1

- From video via motion segmentation see Kumar Torr and Zisserman ICCV 2005.

- Learning
- Learnt automatically using a set of examples

- Detection
- Matches LPS to image using Loopy Belief Propagation
- Localizes object parts

- Like a proposal process.

Fischler and Eschlager. 1973

PS = 2D Parts + Configuration

Aim: Learn pictorial structures in an unsupervised manner

Layered

Pictorial

Structures

(LPS)

Parts +

Configuration +

Relative depth

- Identify parts
- Learn configuration
- Learn relative depth of parts

- Each parts is a variable
- States are image locations
- AND affine deformation

Affine warp of parts

- Each parts is a variable
- States are image locations
- MRF favours certain
- configurations

- D = image.
- Di = pixels Є pi , given li
- (PDF Projection Theorem. )
z = sufficient statistics

- ψ(li,lj) = const, if valid configuration
= 0, otherwise.

Pott’s model

- We want a likelihood that can combine both the outline and the interior appearance of a part.
- Define features which will be sufficient statistics to discriminate foreground and background:

- Outline: z1 Chamfer distance
- Interior: z2 Textons
- Model joint distribution of z1 z2 as a 2D Gaussian.

- Outline (z1) : minimum chamfer distances over multiple outline exemplars
- dcham= 1/n Σi min{ minj ||ui-vj ||, τ }

Image

Edge Image

Distance Transform

- Texture(z2) : MRF classifier
- (Varma and Zisserman, CVPR ’03)

- Having slagged off BoW’s I reveal we used it all along, no big deal.
- So this is like a spatially aware bag of words model…
- Using a spatially flexible set of templates to work out our bag of words.

- Cascades of classifiers
- Efficient likelihood evaluation

- Solving MRF
- LBP, use fast algorithm
- GBP if LBP doesn’t converge
- Could use Semi Definite Programming (2003)
- Recent work second order cone programming method best CVPR 2006.

- Cascade of classifiers
- Top level use chamfer and distance transform for efficient pre filtering
- At lower level use full texture model for verification, using efficient nearest neighbour speed ups.

- Y. Amit, and D. Geman, 97?; S. Baker, S. Nayer 95

(x,y)

- Chamfer like linear classifier on distance transform image Felzenszwalb.
- Tree is a set of linear classifiers.
- Pictorial structure is a parameterized family of linear classifiers.

- The top levels of the tree use outline to eliminate patches of the image.
- Efficiency: Using chamfer distance and pre computed distance map.
- Remaining candidates evaluated using full texture model.

- Goldstein, Platt and Burges (MSR Tech Report, 2003)

Conversion from fixed

distance to rectangle

search

- bitvectorij(Rk) = 1
- = 0
- Nearest neighbour of x
- Find intervals in all dimensions
- ‘AND’ appropriate bitvectors
- Nearest neighbour search on
- pruned exemplars

RkЄ Ii

in dimension j

- SDP formulation (Torr 2001, AI stats)
- SOCP formulation (Kumar, Torr & Zisserman this conference)
- LBP (Huttenlocher, many)

- Problem Formulation
- Form of Shape Prior
- Optimization
- Results

- Given image D, find best labelling as m* = arg max p(m|D)
- Treat LPS parameter Θas a latent (hidden) variable
- EM framework
- E : sample the distribution over Θ
- M : obtain the labelling m

- Given initial labelling m’, determine p(Θ|m’,D)
- Problem
Efficiently sampling from p(Θ|m’,D)

- Solution
- We develop efficient sum-product Loopy Belief Propagation (LBP) for matching LPS.
- Similar to efficient max-product LBP for MAP estimate
- Felzenszwalb and Huttenlocher, CVPR ‘04

- Different samples localize different parts well.
- We cannot use only the MAP estimate of the LPS.

- Given samples from p(Θ|m’,D), get new labelling mnew
- Sample Θiprovides
- Object localization to learn RGB distributions of object and background
- Shape prior for segmentation

- Problem
- Maximize expected log likelihood using all samples
- To efficiently obtain the new labelling

w1 = P(Θ1|m’,D)

Cow Image

Shape Θ1

RGB Histogram for Background

RGB Histogram for Object

w1 = P(Θ1|m’,D)

Cow Image

Shape Θ1

Θ1

m(labels)

Image Plane

D(pixels)

- Best labelling found efficiently using a Single Graph Cut

Obj

Cut

Φx(D|bkg) + Φx(bkg|Θ)

x

…

- Ψxy(mx,my)+
- Φ(D|mx,my)

y

…

…

…

m

z

…

…

Φz(D|obj) + Φz(obj|Θ)

Bkg

Obj

x

…

y

…

…

…

m

z

…

…

Bkg

w2 = P(Θ2|m’,D)

Cow Image

Shape Θ2

RGB Histogram for Background

RGB Histogram for Object

w2 = P(Θ2|m’,D)

Cow Image

Shape Θ2

Θ2

m(labels)

Image Plane

D(pixels)

- Best labelling found efficiently using a Single Graph Cut

Θ1

Θ2

w1

+ w2

+ ….

Image Plane

Image Plane

m* = arg min ∑ wi E (m,Θi)

- Best labelling found efficiently using a Single Graph Cut

- Problem Formulation
- Form of Shape Prior
- Optimization
- Results

Using LPS Model for Cow

Image

Segmentation

Using LPS Model for Cow

In the absence of a clear boundary between object and background

Image

Segmentation

Using LPS Model for Cow

Image

Segmentation

Using LPS Model for Cow

Image

Segmentation

Using LPS Model for Horse

Image

Segmentation

Using LPS Model for Horse

Image

Segmentation

Image

Our Method

Leibe and Schiele

Shape

Shape+Appearance

Appearance

Without Φx(mx|Θ)

Without Φx(D|mx)

- Segmentation boundary can be extracted from edges
- Rough 3D Shape-prior enough for region disambiguation

Energy to be minimized

Pairwise potential

Unary term

Potts model

Shape prior

But what should be the value of θ?

Likelihood of being foreground given a foreground histogram

Likelihood of being foreground given all the terms

Shape prior model

Grimson-Stauffer segmentation

Shape prior (distance transform)

Resulting Graph-Cuts segmentation

Original image

- Comparable to level set methods
- Could use other approaches (e.g. Objcut)
- Need a graph cut per function evaluation

However…

- Kohli and Torr showed how dynamic graph cuts can be used to efficiently find MAP solutions for MRFs that change minimally from one time instant to the next: Dynamic Graph Cuts (ICCV05).

… to compute the MAP of E(x) w.r.t the pose, it means that the unary terms will be changed at EACH iteration and the maxflow recomputed!

solve

SA

differences

between

A and B

PB*

Simpler

problem

A and B

similar

SB

PA

cheaper

operation

PB

computationally

expensive operation

Dynamic Image Segmentation

Image

Segmentation Obtained

Flows in n-edges

Maximum flow

MAP solution

First segmentation problem

Ga

difference

between

Ga and Gb

residual graph (Gr)

second segmentation problem

updated residual graph

G`

Gb

Our Algorithm

- Our method flow recycling
- AC cut recycling
- Both methods: Tree recycling

Running time of the dynamic algorithm

MRF consisting of 2x105 latent variables connected in a 4-neighborhood.

Grimson-Stauffer

Bathia04

Our method

- Combining pose inference and segmentation worth investigating.
- Tracking = Detection
- Detection = Segmentation
- Tracking = Segmentation.
- Segmentation = SFM ??