Recognizing Human-Object Interaction in
Download
1 / 39

Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan' - stephanie-gutierrez


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Date 2013 05 27 instructor prof wang sheng jyh student hung fei fan

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses

Date: 2013/05/27

Instructor: Prof. Wang, Sheng-Jyh

Student: Hung, Fei-Fan

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)


Outline
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Outline1
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Why using context in computer vision
Why using context in computer vision?

  • simple image vs. human activities

Without context:

~3-4%

With mutual context:

with context

without context


Challenges in human pose estimation
Challenges in Human Pose Estimation

  • Human pose estimation is challenging

  • ๏ƒ  Object detection facilitate human pose estimation

Difficult part appearance

Self-occlusion

Image region looks like a body part


Challenges in object detection
Challenges in Object Detection

  • Object detection is challenging

  • ๏ƒ human pose estimation facilitate object detection

Small, low-resolution, partially occluded

Image region similar to detection target


The goal
The Goal

  • To build a mutual context model in Human-Object Interaction(HOI) activities


Outline2
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Model representation
Model representation

A:

  • Modeling the mutual context of object and human poses

Tennis forehand

Croquet shot

Volleyball smash

O:

Tennis racket

Croquet mallet

Volleyball

Tennis ball

Body parts

, M:num of bounding box

H:

More than one atomic pose H in A

P: body parts,


Model representation1
Model representation

activity

  • : co-occurrence compatibility

    between A,O,H

  • : spatial relationship between O,H

  • : modeling the image evidence with detectors

    or classifiers

Human pose

objects

A

H

O1

P2

P1

PL

O2


1 co occurrence context
๐“1: Co-occurrence context

  • co-occurrence between all A,O,H

  • : strength of co-occurrence interaction

    between

A

H

O1

P2

P1

PL

O2

: indicator function

: total number of atomic poses

:total number of objects

:total number of activity classes


2 spatial context
๐“2: Spatial context

:

  • Spatial relationship between all O and different H

  • : weight of

  • :a sparse binary vector

  • shows relative location

  • of w.r.t.

A

H

O1

P2

P1

PL

O2


3 modeling objects
๐“3: Modeling objects

  • Model O in the image I using object detection score

  • For all object O

    • : vector of score of detecting

    • : weight of

  • Between Om and Omโ€™

    • : binary feature vector

    • : weight of and

A

H

O1

P2

P1

PL

O2


4 modeling human pose
๐“4: Modeling human pose

  • Model atomic pose that H belongs to and likelihood

  • : Gaussian likelihood function

  • : vector of score of detecting

    body part in

A

H

O1

P2

P1

PL

O2


5 modeling activity
๐“5: Modeling activity

  • Model HOI activity by training activity classifier

  • : -dim output of one-versus-all (OVA)

    discriminative classifier

    taking image as features

  • : feature weight of

A

H

O1

P2

P1

PL

O2


Model properties
Model Properties

  • Spatial context between O and H

    • Object detectionand human pose estimation facilitate each other

    • Ignore the objects and body parts that are unreliable

  • Flexible to extend to large scale datasets and other activities

    • Jointly model can share all objects and atomic poses


Outline3
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Model learning
Model Learning

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood


Obtaining atomic poses
Obtaining Atomic Poses

  • Using clustering to obtain atomic poses

  • Normalize the annotations

  • Finding missing part

    • Using the nearest visible neighbor

  • Obtain a set of atomic poses

    • Hierarchical clustering

      with maximum linkage

      measure :

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood


Training detectors and classifiers
Training Detectors and Classifiers

  • : Object detector in

  • : Human body part detector in

  • : Overall activity classifier in

Assign human pose

to atomic pose

๏ƒ  deformable part model

Training detectors and classifiers

  • Spatial pyramid matching (SPM)

    • SIFT + 3 level image pyramid

Estimate parameters

by Maximum Likelihood


Estimating model parameters
Estimating Model Parameters

  • Estimate by using ML approach with zero-mean Gaussian prior

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood



Outline4
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Model inference
Model Inference

New image

Update

human body parts

Update object detection results

Initialize

with learned results

Update A and H labels


Initialization
Initialization

New image

A: SPM classification

O: object detection

H: pictorial structure model

Initialize with learned results

Initialize

Activity classification

Object detection

Human pose estimation


Update model inference
Update model inference

  • Marginal distribution of human pose:

  • Using mixture of Gaussian to refine the prior of body part

Update

human body parts

Update object detection results

Update A and H labels


Update model inference1
Update model inference

  • Greedy forward search method :

    • Initial and no object in bounding box

    • ๏ƒ Select

    • ๏ƒ Label box as

    • ๏ƒ update

    • Stop when <0

Update

human body parts

O,H

O,A,H

O,I

Update object detection results

Update A and H labels


Update model inference2
Update model inference

  • Enumerate possible A and H label

  • Optimize

Update

human body parts

Update object detection results

Update A and H labels


Outline5
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Experimental results sports dataset
Experimental Results (Sports Dataset)


Experimental results sports dataset1
Experimental Results (Sports Dataset)


Experimental results sports dataset2
Experimental Results (Sports Dataset)

  • Activity classification


Experimental results ppmi dataset
Experimental results (PPMI Dataset)


Experimental results ppmi dataset1
Experimental results (PPMI Dataset)


Outline6
Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Conclusion
Conclusion

  • Mutual context can significantly improve the performance in difficult visual recognition problems

  • The joint model can share all the information

  • Annotate all the human body parts and objects in training images


Reference
Reference

  • Yao, B., and Fei-fei, L. โ€œRecognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses,โ€ IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)

  • B. Yao and L. Fei-Fei, โ€œModeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,โ€ Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010

  • B. Sapp, A. Toshev, and B. Taskar, โ€œCascade Models for Articulated Pose Estimation,โ€ Proc. European Conf. Computer Vision, 2010.

  • S. Lazebnik, C. Schmid, and J. Ponce, โ€œBeyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,โ€ Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.

  • http://en.wikipedia.org/wiki/Hierarchical_clustering