Representing videos using mid level discriminative patches
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Representing Videos using Mid-level Discriminative Patches PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on
  • Presentation posted in: General

Representing Videos using Mid-level Discriminative Patches. CVPR2013 Poster. Outline. Introduction Mining Discriminative Patches Analyzing Videos Experimental Evaluation & Conclusion. 1. Introduction. Q.1:What does it mean to understand this video ?

Download Presentation

Representing Videos using Mid-level Discriminative Patches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Representing videos using mid level discriminative patches

Representing Videos using Mid-level Discriminative Patches

CVPR2013 Poster


Outline

Outline

  • Introduction

  • Mining Discriminative Patches

  • Analyzing Videos

  • Experimental Evaluation & Conclusion


1 introduction

1. Introduction

  • Q.1:What does it mean to understand this video ?

  • Q.2:How might we achieve such an understanding?


1 introduction1

1. Introduction

  • Video single feature vector

    semantic action

    object

    bits and pieces

  • General framework

object

detect

primitive actions

Bayesian networks

storyline


1 introduction2

1. Introduction

Drawback:

computational models for identifying semantic entities are not robust enough to serve as a basis for video analysis


1 introduction3

1. Introduction

use

Discriminative spatio-temporal patches

  • Represent video

not use

global feature vector or set of semantic entities

Discriminative spatio-temporal patches

semantic object

primitive human action

correspond

automatically

mined

human-object pair

random but informative patches

from training data consisting of hundreds of videos


1 introduction4

1. Introduction

  • spatio-temporal patches

act as a discriminative vocabulary for action classification

establish strong correspondence between patches in training and test videos.

Using label transfer techniques

align the videos and perform tasks

(Ex. object localization, finer-level action detection etc.)


1 introduction5

1. Introduction


1 introduction6

1. Introduction


2 mining discriminative patches

2. Mining Discriminative Patches

  • Two conditions

  • Challenge

(1)They occur frequently within a class.

(2)They are distinct from patches in other classes.

(1)Space of potential spatio-temporal patches is extremely large given that these patches can occur over a range of scales.

(2)Overwhelming majority of video patches are uninteresting.


2 mining discriminative patches1

2. Mining Discriminative Patches

  • Paradigm : bag-of words

  • Major drawbacks

Step1:Sample a few thousand patches, perform k-means clustering to find representative clusters

Step2:Rank these clusters based on membership in different action classes.

(1)High-Dimensional Distance Metric

(2)Partitioning


2 mining discriminative patches2

2. Mining Discriminative Patches

(1)High-Dimensional Distance Metric

K-means use standard distance metric

(Ex. Euclidean or normalized cross-correlation)

Not well in high-dimensional spaces

※ We use HOG3D


2 mining discriminative patches3

2. Mining Discriminative Patches

(2)Partitioning

Standard clustering algorithms partition the entire feature space. Every data point is assigned to one of the clusters during the clustering procedure. However, in many cases, assigning cluster memberships to rare background patches is hard. Due to the forced clustering they significantly diminish the purity of good clusters to which they are assigned


2 mining discriminative patches4

2. Mining Discriminative Patches

  • Resolve these issues

  • Using Exemplar-SVM(e-SVM) to learn

1.Use an exemplar-based clustering approach

2.Every patch is considered as a possible cluster center

Drawback : computationally infeasible

Resolve: use motion  use Nearest Neighbor


2 mining discriminative patches5

2. Mining Discriminative Patches

Training partition: (form cluster)

Training videos

Validation partition : rank the clusters based on representativeness

(ⅰ)Using simple nearest-neighbor approach(typically k=20)

(ⅱ)Score each patch and rank

(ⅲ)select a few patches per action class and use the e-SVM to learn

(ⅳ)e-SVM are used to form clusters

(ⅴ)re-rank


2 mining discriminative patches6

2. Mining Discriminative Patches

  • Goal : smaller dictionary(set of representative patches)

  • Criteria

(a)Appearance Consistency

Consistency score

(b)Purity

tf-idf (score): same class/different class

※ All patches are ranked using a linear combination of the two score


2 mining discriminative patches7

2. Mining Discriminative Patches


3 analyzing videos

3. Analyzing Videos

  • Action Classification

  • Beyond Classification: Explanation via Discriminative Patches

input : test videos

feature vector

output : class

Top n e-SVM detectors

SVM classifier

Q. How we can use detections of discriminative patches for establishing correspondences between training and test videos?

Q. Which detections to select for establishing correspondence?


3 analyzing videos1

3. Analyzing Videos

  • Context-dependent Patch Selection

Vocabulary size : N

candidate detections :{D1,D2,…,DN}

whether or not the detection of e-SVM i is selected : xi

Appearance term(Ai):e-SVM score for patch i

Class Consistency term(Cli):This term promotes selection of certain e-SVMs over others given the action class. For example, for the weightlifting class it prefers selection of the patches with man and bar with vertical motion. We learn Clfrom the training data by counting the number of times that an e-SVM fires for each class.


3 analyzing videos2

3. Analyzing Videos

  • Optimization

Penalty term(Pij):is the penalty term for selecting apair of detections together.

(1)e-SVMsiand j do not fire frequently together in the trainingdata.

(2) the e-SVMs i and j are trained from differentaction classes.

use

Integer Program is an NP-hard problem

5~10 iterations

IPFP algorithm


4 experimental evaluation

4. Experimental Evaluation

  • Datasets :UCF-50,Olympics Sport

  • Implementation Details:

  • Classification Results

※Our current implementation considers only cuboid patches

※Patches are represented with HOG3D features (4x4x5 cells with 20 discrete orientations).


4 experimental evaluation1

4. Experimental Evaluation


4 experimental evaluation2

4. Experimental Evaluation


4 experimental evaluation3

4. Experimental Evaluation

  • Correspondence and Label Transfer


4 experimental evaluation4

4. Experimental Evaluation


4 experimental evaluation5

4. Experimental Evaluation

  • Conclusion

1.A new representation for videos .

2.Automatically mine these patches using exemplar-based clustering approach.

3.Obtaining strong correspondence and align the videos for transferring annotations.

4.As a vocabulary to achieve state of the art results for action classification.


  • Login