Using cross media correlation for scene detection in travel videos
Download
1 / 20

Wei-Ta Chu , Che -Cheng Lin ,Jen-Yu Yu - PowerPoint PPT Presentation


  • 259 Views
  • Uploaded on

Using Cross-Media Correlation for Scene Detection in Travel Videos. Wei-Ta Chu , Che -Cheng Lin ,Jen-Yu Yu. Outline. Introduction Approach Experiments Conclusion. Introduction. Why Use Cross Media Correlation for Scene Detection in Travel Video??

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Wei-Ta Chu , Che -Cheng Lin ,Jen-Yu Yu' - Angelica


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using cross media correlation for scene detection in travel videos l.jpg

Using Cross-Media Correlation for Scene Detection in Travel Videos

Wei-Ta Chu ,Che-Cheng Lin ,Jen-Yu Yu


Outline l.jpg
Outline Videos

  • Introduction

  • Approach

  • Experiments

  • Conclusion


Introduction l.jpg
Introduction Videos

Why Use Cross Media Correlation for Scene Detection in Travel Video??

What Correlation between photos and video?

More and more people get used to record daily life and travel experience both by Digital Cameras and Camcorders.

(much lower cost in Camera and Camcorders)


Slide4 l.jpg

Why Use Cross Media Correlation for VideosScene Detection in Travel Video??

What Correlation between photos and video?

People often capture travel experience by still Camera and Camcorders.

Massive home videos captured in uncontrolled environments, such as overexposure/underexposure and hand shaking.

The content stored in photos and video contain similar information. Such as Landmark , Human’s Face.


Why use cross media correlation for scene detection in travel video l.jpg
Why Use Cross Media Correlation for Scene Detection in Travel Video??

  • It’s Hard for direct scene detection in video.

  • High correlation between photo and video.

  • Photo obtain high quality data (scene detection is more easier).


Approach l.jpg
Approach Travel Video??

  • What’s different purpose that people use photo and video even capture same things?

  • Photo

    To obtain high quality data , capture famous landmark or human’s face

  • Video

    To Capture evolution of an event

Utilize the correlation so that we can succeed the works that are harder to be conducted in videos, but easier to be done in photos


Framework l.jpg
FrameWork Travel Video??

  • To perform scene detection in photos:

    First we cluster photo by checking time information.

  • To perform scene detection in videos:

    First we extract several keyframe for each video shot, and find the optimal matching between photo and keyframe sequences



The proposed cross media scene detection framework l.jpg
The proposed cross-media scene detection framework Travel Video??

Photos

Time-based clustering

Visual word representation

DP-based Matching

Scene

boundaries

Videos

Shot change detection

Filtering

(motion blur cease )

Visual word representation

Keyframe extraction

This process not only reduces the time of cross-media matching, but also eliminates the influence of bad-quality image


Preprocessing l.jpg
Preprocessing Travel Video??

  • Scene Detection for Photos

    utilize different shooting time to cluster photo

    denote the time difference between the ith photo and the (i+1)-th photo as gi

    gi = ti+1-ti

K is an empirical threshhold

D is the size of sliding window

A scene change is claimed to occur between the nth and (n+1)-th photos. We set K as 17 and set d as 10 in this work.


Preprocessing11 l.jpg
Preprocessing Travel Video??

  • Use Global k-means algorithm to extract Keyframe

  • Detect and Filtering blur Keyframe . It’s no only reduces the time of cross-media matching, but also eliminates the influence of bad-quality images.


Visual word representation l.jpg
Visual Word Representation Travel Video??

  • Apply the difference-of-Gaussian(DoG) detector to detect feature points in keyframes and photos

  • Use SIFT(Scale-Invariant Feature Transform) to describe each point as a 128-dimensional feature vector.

  • SIFT-based feature vectors are clustered by a k-means algorithm , and feature points in the same cluster are claimed to belong to the same visual word


Visual word representation13 l.jpg
Visual Word Representation Travel Video??

KeyFrames , Photos

SIFT

Feature point (Feature vector)

K-means

Visual Word


Visual word histogram matching l.jpg
Visual Word Histogram Matching Travel Video??

Xi denote the i th prefix of X, i.e., Xi=<X1 ,X2,…, Xi>

LCS(Xi,Yj) denotes the length of the longest common subsequence between Xi and Yj


Evaluation data l.jpg
Evaluation Data Travel Video??


Evaluation metric l.jpg
Evaluation Metric Travel Video??

The first term indicates the fraction of the current evaluated scene, and the second term indicates how much a given scene is split into smaller scenes.

The purity value ranges from 0 to 1. Larger purity value means that the result is closer to the ground truth

τ(si ,sj*) is the length of overlap between the scene si and sj*

τ(si)is the length of the scene si

T is the total length of all scenes


Slide17 l.jpg
Performance in terms of purity based on different numbers of visual words, with different similarity thresholds



Conclusion l.jpg
Conclusion approaches

For video, extract keyframe by global k-means algo. (Scen spot can be easily determined by time information of photos)

Representingkeyframes and photo set by a sequence of visual word.

Transform scene detection into a sequence matching algo.


Conclusion20 l.jpg
Conclusion approaches

  • By using a dynamic programming approach , find optimal matching between two sequence, determine video scene boundaries with the help of photo scene boundaries.

By experiment on different travel video, different parameter settings, result shows that using correlation between different modalities is effective


ad