slide1
Download
Skip this Video
Download Presentation
Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan

Loading in 2 Seconds...

play fullscreen
1 / 39

Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan' - stephanie-gutierrez


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses

Date: 2013/05/27

Instructor: Prof. Wang, Sheng-Jyh

Student: Hung, Fei-Fan

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

outline
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
outline1
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
why using context in computer vision
Why using context in computer vision?
  • simple image vs. human activities

Without context:

~3-4%

With mutual context:

with context

without context

challenges in human pose estimation
Challenges in Human Pose Estimation
  • Human pose estimation is challenging
  •  Object detection facilitate human pose estimation

Difficult part appearance

Self-occlusion

Image region looks like a body part

challenges in object detection
Challenges in Object Detection
  • Object detection is challenging
  • human pose estimation facilitate object detection

Small, low-resolution, partially occluded

Image region similar to detection target

the goal
The Goal
  • To build a mutual context model in Human-Object Interaction(HOI) activities
outline2
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
model representation
Model representation

A:

  • Modeling the mutual context of object and human poses

Tennis forehand

Croquet shot

Volleyball smash

O:

Tennis racket

Croquet mallet

Volleyball

Tennis ball

Body parts

, M:num of bounding box

H:

More than one atomic pose H in A

P: body parts,

model representation1
Model representation

activity

  • : co-occurrence compatibility

between A,O,H

  • : spatial relationship between O,H
  • : modeling the image evidence with detectors

or classifiers

Human pose

objects

A

H

O1

P2

P1

PL

O2

1 co occurrence context
𝝓1: Co-occurrence context
  • co-occurrence between all A,O,H
  • : strength of co-occurrence interaction

between

A

H

O1

P2

P1

PL

O2

: indicator function

: total number of atomic poses

:total number of objects

:total number of activity classes

2 spatial context
𝝓2: Spatial context

:

  • Spatial relationship between all O and different H
  • : weight of
  • :a sparse binary vector
  • shows relative location
  • of w.r.t.

A

H

O1

P2

P1

PL

O2

3 modeling objects
𝝓3: Modeling objects
  • Model O in the image I using object detection score
  • For all object O
    • : vector of score of detecting
    • : weight of
  • Between Om and Om’
    • : binary feature vector
    • : weight of and

A

H

O1

P2

P1

PL

O2

4 modeling human pose
𝝓4: Modeling human pose
  • Model atomic pose that H belongs to and likelihood
  • : Gaussian likelihood function
  • : vector of score of detecting

body part in

A

H

O1

P2

P1

PL

O2

5 modeling activity
𝝓5: Modeling activity
  • Model HOI activity by training activity classifier
  • : -dim output of one-versus-all (OVA)

discriminative classifier

taking image as features

  • : feature weight of

A

H

O1

P2

P1

PL

O2

model properties
Model Properties
  • Spatial context between O and H
    • Object detectionand human pose estimation facilitate each other
    • Ignore the objects and body parts that are unreliable
  • Flexible to extend to large scale datasets and other activities
    • Jointly model can share all objects and atomic poses
outline3
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
model learning
Model Learning

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood

obtaining atomic poses
Obtaining Atomic Poses
  • Using clustering to obtain atomic poses
  • Normalize the annotations
  • Finding missing part
    • Using the nearest visible neighbor
  • Obtain a set of atomic poses
    • Hierarchical clustering

with maximum linkage

measure :

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood

training detectors and classifiers
Training Detectors and Classifiers
  • : Object detector in
  • : Human body part detector in
  • : Overall activity classifier in

Assign human pose

to atomic pose

 deformable part model

Training detectors and classifiers

  • Spatial pyramid matching (SPM)
    • SIFT + 3 level image pyramid

Estimate parameters

by Maximum Likelihood

estimating model parameters
Estimating Model Parameters
  • Estimate by using ML approach with zero-mean Gaussian prior

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood

outline4
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
model inference
Model Inference

New image

Update

human body parts

Update object detection results

Initialize

with learned results

Update A and H labels

initialization
Initialization

New image

A: SPM classification

O: object detection

H: pictorial structure model

Initialize with learned results

Initialize

Activity classification

Object detection

Human pose estimation

update model inference
Update model inference
  • Marginal distribution of human pose:
  • Using mixture of Gaussian to refine the prior of body part

Update

human body parts

Update object detection results

Update A and H labels

update model inference1
Update model inference
  • Greedy forward search method :
    • Initial and no object in bounding box
    • Select
    • Label box as
    • update
    • Stop when <0

Update

human body parts

O,H

O,A,H

O,I

Update object detection results

Update A and H labels

update model inference2
Update model inference
  • Enumerate possible A and H label
  • Optimize

Update

human body parts

Update object detection results

Update A and H labels

outline5
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
outline6
Outline
  • Introduction
    • Intuition and goal
  • Model Representation
  • Model Learning
    • Obtaining Atomic Poses
    • Training Detectors and Classifiers
    • Estimating Model Parameters
  • Model Inference
  • Experimental Results
  • Conclusion
conclusion
Conclusion
  • Mutual context can significantly improve the performance in difficult visual recognition problems
  • The joint model can share all the information
  • Annotate all the human body parts and objects in training images
reference
Reference
  • Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)
  • B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010
  • B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010.
  • S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
  • http://en.wikipedia.org/wiki/Hierarchical_clustering
ad