O bj c ut pose cut cvpr 05 eccv 06
This presentation is the property of its rightful owner.
Sponsored Links
1 / 88

O BJ C UT & Pose Cut CVPR 05 ECCV 06 PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on
  • Presentation posted in: General

UNIVERSITY OF OXFORD. O BJ C UT & Pose Cut CVPR 05 ECCV 06. Philip Torr M. Pawan Kumar, Pushmeet Kohli and Andrew Zisserman. Conclusion. Combining pose inference and segmentation worth investigating. (tommorrow) Tracking = Detection Detection = Segmentation

Download Presentation

O BJ C UT & Pose Cut CVPR 05 ECCV 06

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


O bj c ut pose cut cvpr 05 eccv 06

UNIVERSITY

OF

OXFORD

OBJ CUT & Pose CutCVPR 05ECCV 06

Philip Torr

M. Pawan Kumar, Pushmeet Kohli and Andrew Zisserman


Conclusion

Conclusion

  • Combining pose inference and segmentation worth investigating. (tommorrow)

  • Tracking = Detection

  • Detection = Segmentation

  • Tracking (pose estimation) = Segmentation.


Segmentation

First segmentation problem

Segmentation

  • To distinguish cow and horse?


Obj cut pose cut cvpr 05 eccv 06

Aim

  • Given an image, to segment the object

Object

Category

Model

Segmentation

Cow Image

Segmented Cow

  • Segmentation should (ideally) be

  • shaped like the object e.g. cow-like

  • obtained efficiently in an unsupervised manner

  • able to handle self-occlusion


Challenges

Challenges

Intra-Class Shape Variability

Intra-Class Appearance Variability

Self Occlusion


Motivation

Motivation

Magic Wand

  • Current methods require user intervention

  • Object and background seed pixels (Boykov and Jolly, ICCV 01)

  • Bounding Box of object (Rother et al. SIGGRAPH 04)

Object Seed Pixels

Cow Image


Motivation1

Motivation

Magic Wand

  • Current methods require user intervention

  • Object and background seed pixels (Boykov and Jolly, ICCV 01)

  • Bounding Box of object (Rother et al. SIGGRAPH 04)

Object Seed Pixels

Background Seed Pixels

Cow Image


Motivation2

Motivation

Magic Wand

  • Current methods require user intervention

  • Object and background seed pixels (Boykov and Jolly, ICCV 01)

  • Bounding Box of object (Rother et al. SIGGRAPH 04)

Segmented Image


Motivation3

Motivation

Magic Wand

  • Current methods require user intervention

  • Object and background seed pixels (Boykov and Jolly, ICCV 01)

  • Bounding Box of object (Rother et al. SIGGRAPH 04)

Object Seed Pixels

Background Seed Pixels

Cow Image


Motivation4

Motivation

Magic Wand

  • Current methods require user intervention

  • Object and background seed pixels (Boykov and Jolly, ICCV 01)

  • Bounding Box of object (Rother et al. SIGGRAPH 04)

Segmented Image


Motivation5

Motivation

  • Problem

  • Manually intensive

  • Segmentation is not guaranteed to be ‘object-like’

Non Object-like Segmentation


Our method

Our Method

  • Combine object detection with segmentation

    • Borenstein and Ullman, ECCV ’02

    • Leibe and Schiele, BMVC ’03

  • Incorporate global shape priors in MRF

  • Detection provides

    • Object Localization

    • Global shape priors

  • Automatically segments the object

    • Note our method completely generic

    • Applicable to any object category model


Outline

Outline

  • Problem Formulation

  • Form of Shape Prior

  • Optimization

  • Results


Problem

Problem

  • Labelling m over the set of pixels D

  • Shape prior provided by parameter Θ

  • Energy E (m,Θ) = ∑Φx(D|mx)+Φx(mx|Θ) + ∑ Ψxy(mx,my)+ Φ(D|mx,my)

  • Unary terms

    • Likelihood based on colour

    • Unary potential based on distance from Θ

  • Pairwise terms

    • Prior

    • Contrast term

  • Find best labelling m* = arg min ∑ wi E (m,Θi)

    • wi is the weight for sample Θi

Unary terms

Pairwise terms


Obj cut pose cut cvpr 05 eccv 06

MRF

  • Probability for a labellingconsists of

  • Likelihood

    • Unary potential based on colour of pixel

  • Prior which favours same labels for neighbours (pairwise potentials)

mx

m(labels)

Prior Ψxy(mx,my)

my

Unary Potential Φx(D|mx)

x

y

D(pixels)

Image Plane


Example

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Φx(D|obj)

x

x

Φx(D|bkg)

Ψxy(mx,my)

y

y

Prior

Likelihood Ratio (Colour)


Example1

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Prior

Likelihood Ratio (Colour)


Contrast dependent mrf

Contrast-Dependent MRF

  • Probability of labelling in addition has

  • Contrast term which favours boundaries to lie on image edges

mx

m(labels)

my

x

Contrast Term

Φ(D|mx,my)

y

D(pixels)

Image Plane


Example2

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Φx(D|obj)

x

x

Φx(D|bkg)

Ψxy(mx,my)+

Φ(D|mx,my)

y

y

Prior + Contrast

Likelihood Ratio (Colour)


Example3

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

Prior + Contrast

Likelihood Ratio (Colour)


Our model

Our Model

  • Probability of labelling in addition has

  • Unary potential which depend on distance from Θ (shape parameter)

Θ (shape parameter)

Unary Potential

Φx(mx|Θ)

mx

m(labels)

my

Object Category

Specific MRF

x

y

D(pixels)

Image Plane


Example4

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

ShapePriorΘ

Prior + Contrast

Distance from Θ


Example5

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

ShapePriorΘ

Prior + Contrast

Likelihood + Distance from Θ


Example6

Example

Cow Image

Object Seed

Pixels

Background Seed

Pixels

ShapePriorΘ

Prior + Contrast

Likelihood + Distance from Θ


Outline1

Outline

  • Problem Formulation

    • E (m,Θ) = ∑Φx(D|mx)+Φx(mx|Θ) + ∑ Ψxy(mx,my)+ Φ(D|mx,my)

  • Form of Shape Prior

  • Optimization

  • Results


Detection

Detection

  • BMVC 2004


Layered pictorial structures lps

Layered Pictorial Structures (LPS)

  • Generative model

  • Composition of parts + spatial layout

Layer 2

Spatial Layout

(Pairwise Configuration)

Layer 1

Parts in Layer 2 can occlude parts in Layer 1


Layered pictorial structures lps1

Layered Pictorial Structures (LPS)

Cow Instance

Layer 2

Transformations

Θ1

P(Θ1) = 0.9

Layer 1


Layered pictorial structures lps2

Layered Pictorial Structures (LPS)

Cow Instance

Layer 2

Transformations

Θ2

P(Θ2) = 0.8

Layer 1


Layered pictorial structures lps3

Layered Pictorial Structures (LPS)

Unlikely Instance

Layer 2

Transformations

Θ3

P(Θ3) = 0.01

Layer 1


How to learn lps

How to learn LPS

  • From video via motion segmentation see Kumar Torr and Zisserman ICCV 2005.


Lps for detection

LPS for Detection

  • Learning

    • Learnt automatically using a set of examples

  • Detection

    • Matches LPS to image using Loopy Belief Propagation

    • Localizes object parts


Detection1

Detection

  • Like a proposal process.


Pictorial structures ps

Pictorial Structures (PS)

Fischler and Eschlager. 1973

PS = 2D Parts + Configuration

Aim: Learn pictorial structures in an unsupervised manner

Layered

Pictorial

Structures

(LPS)

Parts +

Configuration +

Relative depth

  • Identify parts

  • Learn configuration

  • Learn relative depth of parts


Pictorial structures

Pictorial Structures

  • Each parts is a variable

  • States are image locations

  • AND affine deformation

Affine warp of parts


Pictorial structures1

Pictorial Structures

  • Each parts is a variable

  • States are image locations

  • MRF favours certain

  • configurations


Bayesian formulation mrf

Bayesian Formulation (MRF)

  • D = image.

  • Di = pixels Є pi , given li

  • (PDF Projection Theorem. )

    z = sufficient statistics

  • ψ(li,lj) = const, if valid configuration

    = 0, otherwise.

Pott’s model


Defining the likelihood

Defining the likelihood

  • We want a likelihood that can combine both the outline and the interior appearance of a part.

  • Define features which will be sufficient statistics to discriminate foreground and background:


Features

Features

  • Outline: z1 Chamfer distance

  • Interior: z2 Textons

  • Model joint distribution of z1 z2 as a 2D Gaussian.


Chamfer match score

Chamfer Match Score

  • Outline (z1) : minimum chamfer distances over multiple outline exemplars

  • dcham= 1/n Σi min{ minj ||ui-vj ||, τ }

Image

Edge Image

Distance Transform


Texton match score

Texton Match Score

  • Texture(z2) : MRF classifier

    • (Varma and Zisserman, CVPR ’03)

  • Multiple texture exemplars x of class t

  • Textons: 3 X 3 square neighbourhood

  • VQ in texton space

  • Descriptor: histogram of texton labelling

  • χ2 distance


  • Bag of words histogram of textons

    Bag of Words/Histogram of Textons

    • Having slagged off BoW’s I reveal we used it all along, no big deal.

    • So this is like a spatially aware bag of words model…

    • Using a spatially flexible set of templates to work out our bag of words.


    2 fitting the model

    2. Fitting the Model

    • Cascades of classifiers

      • Efficient likelihood evaluation

    • Solving MRF

      • LBP, use fast algorithm

      • GBP if LBP doesn’t converge

      • Could use Semi Definite Programming (2003)

      • Recent work second order cone programming method best CVPR 2006.


    Efficient detection of parts

    Efficient Detection of parts

    • Cascade of classifiers

    • Top level use chamfer and distance transform for efficient pre filtering

    • At lower level use full texture model for verification, using efficient nearest neighbour speed ups.


    Cascade of classifiers for each part

    Cascade of Classifiers-for each part

    • Y. Amit, and D. Geman, 97?; S. Baker, S. Nayer 95


    High levels based on outline

    High Levels based on Outline

    (x,y)


    Side note

    Side Note

    • Chamfer like linear classifier on distance transform image Felzenszwalb.

    • Tree is a set of linear classifiers.

    • Pictorial structure is a parameterized family of linear classifiers.


    Low levels on texture

    Low levels on Texture

    • The top levels of the tree use outline to eliminate patches of the image.

    • Efficiency: Using chamfer distance and pre computed distance map.

    • Remaining candidates evaluated using full texture model.


    Efficient nearest neighbour

    Efficient Nearest Neighbour

    • Goldstein, Platt and Burges (MSR Tech Report, 2003)

    Conversion from fixed

    distance to rectangle

    search

    • bitvectorij(Rk) = 1

    • = 0

    • Nearest neighbour of x

    • Find intervals in all dimensions

    • ‘AND’ appropriate bitvectors

    • Nearest neighbour search on

    • pruned exemplars

    RkЄ Ii

    in dimension j


    Recently solve via integer programming

    Recently solve via Integer Programming

    • SDP formulation (Torr 2001, AI stats)

    • SOCP formulation (Kumar, Torr & Zisserman this conference)

    • LBP (Huttenlocher, many)


    Outline2

    Outline

    • Problem Formulation

    • Form of Shape Prior

    • Optimization

    • Results


    Optimization

    Optimization

    • Given image D, find best labelling as m* = arg max p(m|D)

    • Treat LPS parameter Θas a latent (hidden) variable

    • EM framework

      • E : sample the distribution over Θ

      • M : obtain the labelling m


    E step

    E-Step

    • Given initial labelling m’, determine p(Θ|m’,D)

    • Problem

      Efficiently sampling from p(Θ|m’,D)

    • Solution

    • We develop efficient sum-product Loopy Belief Propagation (LBP) for matching LPS.

    • Similar to efficient max-product LBP for MAP estimate

      • Felzenszwalb and Huttenlocher, CVPR ‘04


    Results

    Results

    • Different samples localize different parts well.

    • We cannot use only the MAP estimate of the LPS.


    M step

    M-Step

    • Given samples from p(Θ|m’,D), get new labelling mnew

    • Sample Θiprovides

      • Object localization to learn RGB distributions of object and background

      • Shape prior for segmentation

    • Problem

      • Maximize expected log likelihood using all samples

      • To efficiently obtain the new labelling


    M step1

    M-Step

    w1 = P(Θ1|m’,D)

    Cow Image

    Shape Θ1

    RGB Histogram for Background

    RGB Histogram for Object


    M step2

    M-Step

    w1 = P(Θ1|m’,D)

    Cow Image

    Shape Θ1

    Θ1

    m(labels)

    Image Plane

    D(pixels)

    • Best labelling found efficiently using a Single Graph Cut


    Segmentation using graph cuts

    Segmentation using Graph Cuts

    Obj

    Cut

    Φx(D|bkg) + Φx(bkg|Θ)

    x

    • Ψxy(mx,my)+

    • Φ(D|mx,my)

    y

    m

    z

    Φz(D|obj) + Φz(obj|Θ)

    Bkg


    Segmentation using graph cuts1

    Segmentation using Graph Cuts

    Obj

    x

    y

    m

    z

    Bkg


    M step3

    M-Step

    w2 = P(Θ2|m’,D)

    Cow Image

    Shape Θ2

    RGB Histogram for Background

    RGB Histogram for Object


    M step4

    M-Step

    w2 = P(Θ2|m’,D)

    Cow Image

    Shape Θ2

    Θ2

    m(labels)

    Image Plane

    D(pixels)

    • Best labelling found efficiently using a Single Graph Cut


    M step5

    M-Step

    Θ1

    Θ2

    w1

    + w2

    + ….

    Image Plane

    Image Plane

    m* = arg min ∑ wi E (m,Θi)

    • Best labelling found efficiently using a Single Graph Cut


    Outline3

    Outline

    • Problem Formulation

    • Form of Shape Prior

    • Optimization

    • Results


    Results1

    Results

    Using LPS Model for Cow

    Image

    Segmentation


    Results2

    Results

    Using LPS Model for Cow

    In the absence of a clear boundary between object and background

    Image

    Segmentation


    Results3

    Results

    Using LPS Model for Cow

    Image

    Segmentation


    Results4

    Results

    Using LPS Model for Cow

    Image

    Segmentation


    Results5

    Results

    Using LPS Model for Horse

    Image

    Segmentation


    Results6

    Results

    Using LPS Model for Horse

    Image

    Segmentation


    Results7

    Results

    Image

    Our Method

    Leibe and Schiele


    Results8

    Results

    Shape

    Shape+Appearance

    Appearance

    Without Φx(mx|Θ)

    Without Φx(D|mx)


    Face detector and objcut

    Face Detector and ObjCut


    Do we really need accurate models

    Do we really need accurate models?

    • Segmentation boundary can be extracted from edges

    • Rough 3D Shape-prior enough for region disambiguation


    Energy of the pose specific mrf

    Energy of the Pose-specific MRF

    Energy to be minimized

    Pairwise potential

    Unary term

    Potts model

    Shape prior

    But what should be the value of θ?


    The different terms of the mrf

    The different terms of the MRF

    Likelihood of being foreground given a foreground histogram

    Likelihood of being foreground given all the terms

    Shape prior model

    Grimson-Stauffer segmentation

    Shape prior (distance transform)

    Resulting Graph-Cuts segmentation

    Original image


    Can segment multiple views simultaneously

    Can segment multiple views simultaneously


    Solve via gradient descent

    Solve via gradient descent

    • Comparable to level set methods

    • Could use other approaches (e.g. Objcut)

    • Need a graph cut per function evaluation


    Formulating the pose inference problem

    Formulating the Pose Inference Problem


    Obj cut pose cut cvpr 05 eccv 06

    However…

    • Kohli and Torr showed how dynamic graph cuts can be used to efficiently find MAP solutions for MRFs that change minimally from one time instant to the next: Dynamic Graph Cuts (ICCV05).

    But…

    … to compute the MAP of E(x) w.r.t the pose, it means that the unary terms will be changed at EACH iteration and the maxflow recomputed!


    Dynamic graph cuts

    solve

    SA

    differences

    between

    A and B

    PB*

    Simpler

    problem

    A and B

    similar

    SB

    Dynamic Graph Cuts

    PA

    cheaper

    operation

    PB

    computationally

    expensive operation


    Obj cut pose cut cvpr 05 eccv 06

    Dynamic Image Segmentation

    Image

    Segmentation Obtained

    Flows in n-edges


    Obj cut pose cut cvpr 05 eccv 06

    Maximum flow

    MAP solution

    First segmentation problem

    Ga

    difference

    between

    Ga and Gb

    residual graph (Gr)

    second segmentation problem

    updated residual graph

    G`

    Gb

    Our Algorithm


    Dynamic graph cut vs active cuts

    Dynamic Graph Cut vs Active Cuts

    • Our method flow recycling

    • AC cut recycling

    • Both methods: Tree recycling


    Experimental analysis

    ExperimentalAnalysis

    Running time of the dynamic algorithm

    MRF consisting of 2x105 latent variables connected in a 4-neighborhood.


    Segmentation comparison

    Segmentation Comparison

    Grimson-Stauffer

    Bathia04

    Our method


    Segmentation1

    Segmentation


    Segmentation2

    Segmentation


    Conclusion1

    Conclusion

    • Combining pose inference and segmentation worth investigating.

    • Tracking = Detection

    • Detection = Segmentation

    • Tracking = Segmentation.

    • Segmentation = SFM ??


  • Login