Content based video indexing classification retrieval l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Content-based Video Indexing, Classification & Retrieval PowerPoint PPT Presentation


  • 160 Views
  • Uploaded on
  • Presentation posted in: General

Content-based Video Indexing, Classification & Retrieval. Presented by HOI, Chu Hong Nov. 27, 2002. Outline. Motivation Introduction Two approaches for semantic analysis A probabilistic framework ( Naphade, Huang ’01 )

Download Presentation

Content-based Video Indexing, Classification & Retrieval

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Content based video indexing classification retrieval l.jpg

Content-based Video Indexing, Classification & Retrieval

Presented by HOI, Chu Hong

Nov. 27, 2002


Outline l.jpg

Outline

  • Motivation

  • Introduction

  • Two approaches for semantic analysis

    • A probabilistic framework (Naphade, Huang ’01)

    • Object-based abstraction and modeling [Lee, Kim, Hwang ’01]

  • A multimodal framework for video content interpretation

  • Conclusion


Motivation l.jpg

Motivation

  • There is an amazing growth in the amount of digital video data in recent years.

  • Lack of tools for classify and retrieve video content

  • There exists a gap between low-level features and high-level semantic content.

  • To let machine understand video is important and challenging.


Introduction l.jpg

Introduction

  • Content-based Video indexing

    • the process of attaching content based labels to video shots

    • essential for content-based classification and retrieval

    • Using automatic analysis techniques

      - shot detection, video segmentation

      - key frame selection

      - object segmentation and recognition

      - visual/audio feature extraction

      - speech recognition, video text, VOCR


Introduction5 l.jpg

Introduction

  • Content-based Video Classification

    • Segment & classify videos into meaning categories

    • Classify videos based on predefined topic

    • Useful for browsing and searching by topic

    • Multimodal method

      • Visual features

      • Audio features

      • Motion features

      • Textual features

    • Domain-specific knowledge


Introduction6 l.jpg

Introduction

  • Content-based Video Retrieval

    • Simple visual feature query

      • Retrieve video with key-frame: Color-R(80%),G(10%),B(10%)

    • Feature combination query

      • Retrieve video with high motion upward(70%), Blue(30%)

    • Query by example (QBE)

      • Retrieve video which is similar to example

    • Localized feature query

      • Retrieve video with a running car toward right

    • Object relationship query

      • Retrieve video with a girl watching the sun set

    • Concept query (query by keyword)

      • Retrieve explosion, White Christmas


Introduction7 l.jpg

Introduction

  • Feature Extraction

    • Color features

    • Texture features

    • Shape features

    • Sketch features

    • Audio features

    • Camera motion features

    • Object motion features


Semantic indexing querying l.jpg

Semantic Indexing & Querying

  • Limitation of QBE

    • Measuring similarity using only low-level features

    • Lack reflection of user’s perception

    • Difficult annotation of high level features

  • Syntactic to Semantic

    • Bridge the gap between low-level feature and semantic content

    • Semantic indexing, Query By Keyword (QBK)

  • Semantic description scheme – MPEG-7

    • Semantic interaction between concepts

    • no scheme to learn the model for individual concepts


Semantic modeling indexing l.jpg

Semantic Modeling & Indexing

  • Two approaches

    • Probabilisticframework, ‘Multiject’ (Naphade’01)

    • Object-based abstraction and indexing [Lee, Kim, Hwang ’01]


A probabilistic approach multiject multinet naphade huang 01 l.jpg

A probabilistic approach (‘Multiject’ & ‘Multinet’) (Naphade, Huang ’01)

  • a probabilistic multimedia object

  • 3 categories semantic concepts

    • Objects

      • Face, car, animal, building

    • Sites

      • Sky, mountain, outdoor, cityscape

    • Events

      • Explosion, waterfall, gunshot, dancing


Multiject for semantic concept l.jpg

Multiject for semantic concept

P( Outdoor = Present | features, other multijects) = 0.7

Other multijects

Outdoor

Visual features

Audio features

Text features


How to create a multiject l.jpg

How to create a Multiject

  • Shot-boundary detection

  • Spatio-temporal segmentation of within-shot frames

  • Feature extraction (color, texture, edge direction, etc )

  • Modeling

    • Sites: mixture of Gaussians

    • Events: hidden Markov models (HMMs) with observation densities as gaussian mixtures

    • All audio events: modeled using HMMs

    • Each segment is tested for each concept and the information is then composed at frame level


Multiject hierarchical hmm l.jpg

Multiject : Hierarchical HMM

ss1 - ssm : state sequence for supervisor HMM

sa1 - sam : state sequence for audio HMM

xa1 - xam : audio observations

sv1 - svm : state sequence for video HMM

xv1 - xvm : video observations


Multinet concept building based on multiject l.jpg

Multinet: Concept Building based on Multiject

  • A network of multijects modeling interaction between them

  • + / - : positive/negative interaction between multijects


Bayesian multinet l.jpg

Bayesian Multinet

  • Nodes : binary random variables (presence/absence of multiject)

  • Layer 0 : frame-level multiject-based semantic features

  • Layer 1 : inference from layer 0 :

  • Layer 2 : higher level for performance improvement


Object based semantic video modeling l.jpg

Video Sequence

VO Extraction

Object-based

Video Abstraction

Object-based Low-Level Feature Extraction

Indexing

/Retrieving

Semantic Features Modeling

Object-based SemanticVideo Modeling


Slide17 l.jpg

In

In-1

von-1

Motion Projection

Model Update

(Histogram Backprojection)

delay

Object Post-processing

von

Object Extraction based on Object Tracking [Kim, Hwang ‘00]


Semantic feature modeling l.jpg

Object Features

HMM

Training

Pre-processing

Abstracted frame sequence

Semantic Feature Modeling

  • Modeling based on temporal variation of object features

  • Boundary shape and motion statistics of object area


Hmm modeling l.jpg

…..

S1

S2

ST

HMM Modeling

1. Observation Sequence

O1 ……. OT

.

.

.

.

object features

2. Left-Right 1-D HMM modeling


Video modeling three layer structure l.jpg

Video Modeling: Three Layer Structure

Three layer structure of video modeling, compared to NLP

Video Understanding

Natural Language Processing

Content Interpretation

Interpretation

Semantic

Video Modeling

Frame-based Structural

Modeling

Object-based Structural Modeling

Sentence Structure & grammar

Word Recognition

Audio-Visual Feature Extraction


A multimodal framework for video content interpretation l.jpg

A Multimodal Framework for Video Content Interpretation

  • Long-term goal

  • Application on automatic TV Programs Scout

  • Allow user to request topic-level programs

  • Integrate multiple modalities: visual, audio and Text information

  • Multi-level concepts

    • Low: low-level feature

    • Mid: object detection, event modeling

    • High: classification result of semantic content

  • Probabilistic model, Using Bayesian network for classification (causal relationship, domain-knowledge)


How to work with the framework l.jpg

How to work with the framework?

  • Preprocessing

    • Story segmentation (shot detection)

    • VOCR, Speech Recognition

    • Key frame selection

  • Feature Extraction

    • Visual features based on key-frame

      • Color, texture, shape, sketch, etc.

    • Audio features

      • average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc.

    • Textual features (Transcript)

      • Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc.

      • Word spotting, vote histogram

    • Motion features

      • Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying

      • Motion trajectories (moving objects)

      • Object abstraction, recognition

  • Building and training the Bayesian network


Challenging points l.jpg

Challenging points

  • Preprocessing is significant in the framework.

    • Accuracy of key-frame selection

    • Accuracy of speech recognition & VOCR

  • Good feature extraction is important for the performance of classification.

  • Modeling semantic video objects and events

  • How to integrate multiple modalities still need to be well considered.


Conclusion l.jpg

Conclusion

  • Introduction of several basic concepts

  • Semantic video modeling and indexing

  • Propose a multimodal framework for topic classification of Video

  • Discussion of Challenging problems


Slide26 l.jpg

Q & A

Thank you!


  • Login