Recognizing Human-Object Interaction in
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction

Download Presentation

Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Date 2013 05 27 instructor prof wang sheng jyh student hung fei fan

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses

Date: 2013/05/27

Instructor: Prof. Wang, Sheng-Jyh

Student: Hung, Fei-Fan

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)


Outline

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Outline1

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Why using context in computer vision

Why using context in computer vision?

  • simple image vs. human activities

Without context:

~3-4%

With mutual context:

with context

without context


Challenges in human pose estimation

Challenges in Human Pose Estimation

  • Human pose estimation is challenging

  •  Object detection facilitate human pose estimation

Difficult part appearance

Self-occlusion

Image region looks like a body part


Challenges in object detection

Challenges in Object Detection

  • Object detection is challenging

  • human pose estimation facilitate object detection

Small, low-resolution, partially occluded

Image region similar to detection target


The goal

The Goal

  • To build a mutual context model in Human-Object Interaction(HOI) activities


Outline2

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Model representation

Model representation

A:

  • Modeling the mutual context of object and human poses

Tennis forehand

Croquet shot

Volleyball smash

O:

Tennis racket

Croquet mallet

Volleyball

Tennis ball

Body parts

, M:num of bounding box

H:

More than one atomic pose H in A

P: body parts,


Model representation1

Model representation

activity

  • : co-occurrence compatibility

    between A,O,H

  • : spatial relationship between O,H

  • : modeling the image evidence with detectors

    or classifiers

Human pose

objects

A

H

O1

P2

P1

PL

O2


1 co occurrence context

𝝓1: Co-occurrence context

  • co-occurrence between all A,O,H

  • : strength of co-occurrence interaction

    between

A

H

O1

P2

P1

PL

O2

: indicator function

: total number of atomic poses

:total number of objects

:total number of activity classes


2 spatial context

𝝓2: Spatial context

:

  • Spatial relationship between all O and different H

  • : weight of

  • :a sparse binary vector

  • shows relative location

  • of w.r.t.

A

H

O1

P2

P1

PL

O2


3 modeling objects

𝝓3: Modeling objects

  • Model O in the image I using object detection score

  • For all object O

    • : vector of score of detecting

    • : weight of

  • Between Om and Om’

    • : binary feature vector

    • : weight of and

A

H

O1

P2

P1

PL

O2


4 modeling human pose

𝝓4: Modeling human pose

  • Model atomic pose that H belongs to and likelihood

  • : Gaussian likelihood function

  • : vector of score of detecting

    body part in

A

H

O1

P2

P1

PL

O2


5 modeling activity

𝝓5: Modeling activity

  • Model HOI activity by training activity classifier

  • : -dim output of one-versus-all (OVA)

    discriminative classifier

    taking image as features

  • : feature weight of

A

H

O1

P2

P1

PL

O2


Model properties

Model Properties

  • Spatial context between O and H

    • Object detectionand human pose estimation facilitate each other

    • Ignore the objects and body parts that are unreliable

  • Flexible to extend to large scale datasets and other activities

    • Jointly model can share all objects and atomic poses


Outline3

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Model learning

Model Learning

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood


Obtaining atomic poses

Obtaining Atomic Poses

  • Using clustering to obtain atomic poses

  • Normalize the annotations

  • Finding missing part

    • Using the nearest visible neighbor

  • Obtain a set of atomic poses

    • Hierarchical clustering

      with maximum linkage

      measure :

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood


Training detectors and classifiers

Training Detectors and Classifiers

  • : Object detector in

  • : Human body part detector in

  • : Overall activity classifier in

Assign human pose

to atomic pose

 deformable part model

Training detectors and classifiers

  • Spatial pyramid matching (SPM)

    • SIFT + 3 level image pyramid

Estimate parameters

by Maximum Likelihood


Estimating model parameters

Estimating Model Parameters

  • Estimate by using ML approach with zero-mean Gaussian prior

Assign human pose

to atomic pose

Training detectors and classifiers

Estimate parameters

by Maximum Likelihood


Learning result

Learning result


Outline4

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Model inference

Model Inference

New image

Update

human body parts

Update object detection results

Initialize

with learned results

Update A and H labels


Initialization

Initialization

New image

A: SPM classification

O: object detection

H: pictorial structure model

Initialize with learned results

Initialize

Activity classification

Object detection

Human pose estimation


Update model inference

Update model inference

  • Marginal distribution of human pose:

  • Using mixture of Gaussian to refine the prior of body part

Update

human body parts

Update object detection results

Update A and H labels


Update model inference1

Update model inference

  • Greedy forward search method :

    • Initial and no object in bounding box

    • Select

    • Label box as

    • update

    • Stop when <0

Update

human body parts

O,H

O,A,H

O,I

Update object detection results

Update A and H labels


Update model inference2

Update model inference

  • Enumerate possible A and H label

  • Optimize

Update

human body parts

Update object detection results

Update A and H labels


Outline5

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Experimental results sports dataset

Experimental Results (Sports Dataset)


Experimental results sports dataset1

Experimental Results (Sports Dataset)


Experimental results sports dataset2

Experimental Results (Sports Dataset)

  • Activity classification


Experimental results ppmi dataset

Experimental results (PPMI Dataset)


Experimental results ppmi dataset1

Experimental results (PPMI Dataset)


Outline6

Outline

  • Introduction

    • Intuition and goal

  • Model Representation

  • Model Learning

    • Obtaining Atomic Poses

    • Training Detectors and Classifiers

    • Estimating Model Parameters

  • Model Inference

  • Experimental Results

  • Conclusion


Conclusion

Conclusion

  • Mutual context can significantly improve the performance in difficult visual recognition problems

  • The joint model can share all the information

  • Annotate all the human body parts and objects in training images


Reference

Reference

  • Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)

  • B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010

  • B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010.

  • S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.

  • http://en.wikipedia.org/wiki/Hierarchical_clustering


  • Login