Content based video indexing classification retrieval
1 / 26

Content-based Video Indexing, Classification Retrieval - PowerPoint PPT Presentation

  • Updated On :

Content-based Video Indexing, Classification & Retrieval. Presented by HOI, Chu Hong Nov. 27, 2002. Outline. Motivation Introduction Two approaches for semantic analysis A probabilistic framework ( Naphade, Huang ’01 )

Related searches for Content-based Video Indexing, Classification Retrieval

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Content-based Video Indexing, Classification Retrieval' - sheryl

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Content based video indexing classification retrieval l.jpg

Content-based Video Indexing, Classification & Retrieval

Presented by HOI, Chu Hong

Nov. 27, 2002

Outline l.jpg

  • Motivation

  • Introduction

  • Two approaches for semantic analysis

    • A probabilistic framework (Naphade, Huang ’01)

    • Object-based abstraction and modeling [Lee, Kim, Hwang ’01]

  • A multimodal framework for video content interpretation

  • Conclusion

Motivation l.jpg

  • There is an amazing growth in the amount of digital video data in recent years.

  • Lack of tools for classify and retrieve video content

  • There exists a gap between low-level features and high-level semantic content.

  • To let machine understand video is important and challenging.

Introduction l.jpg

  • Content-based Video indexing

    • the process of attaching content based labels to video shots

    • essential for content-based classification and retrieval

    • Using automatic analysis techniques

      - shot detection, video segmentation

      - key frame selection

      - object segmentation and recognition

      - visual/audio feature extraction

      - speech recognition, video text, VOCR

Introduction5 l.jpg

  • Content-based Video Classification

    • Segment & classify videos into meaning categories

    • Classify videos based on predefined topic

    • Useful for browsing and searching by topic

    • Multimodal method

      • Visual features

      • Audio features

      • Motion features

      • Textual features

    • Domain-specific knowledge

Introduction6 l.jpg

  • Content-based Video Retrieval

    • Simple visual feature query

      • Retrieve video with key-frame: Color-R(80%),G(10%),B(10%)

    • Feature combination query

      • Retrieve video with high motion upward(70%), Blue(30%)

    • Query by example (QBE)

      • Retrieve video which is similar to example

    • Localized feature query

      • Retrieve video with a running car toward right

    • Object relationship query

      • Retrieve video with a girl watching the sun set

    • Concept query (query by keyword)

      • Retrieve explosion, White Christmas

Introduction7 l.jpg

  • Feature Extraction

    • Color features

    • Texture features

    • Shape features

    • Sketch features

    • Audio features

    • Camera motion features

    • Object motion features

Semantic indexing querying l.jpg
Semantic Indexing & Querying

  • Limitation of QBE

    • Measuring similarity using only low-level features

    • Lack reflection of user’s perception

    • Difficult annotation of high level features

  • Syntactic to Semantic

    • Bridge the gap between low-level feature and semantic content

    • Semantic indexing, Query By Keyword (QBK)

  • Semantic description scheme – MPEG-7

    • Semantic interaction between concepts

    • no scheme to learn the model for individual concepts

Semantic modeling indexing l.jpg
Semantic Modeling & Indexing

  • Two approaches

    • Probabilisticframework, ‘Multiject’ (Naphade’01)

    • Object-based abstraction and indexing [Lee, Kim, Hwang ’01]

A probabilistic approach multiject multinet naphade huang 01 l.jpg
A probabilistic approach (‘Multiject’ & ‘Multinet’) (Naphade, Huang ’01)

  • a probabilistic multimedia object

  • 3 categories semantic concepts

    • Objects

      • Face, car, animal, building

    • Sites

      • Sky, mountain, outdoor, cityscape

    • Events

      • Explosion, waterfall, gunshot, dancing

Multiject for semantic concept l.jpg
Multiject for semantic concept

P( Outdoor = Present | features, other multijects) = 0.7

Other multijects


Visual features

Audio features

Text features

How to create a multiject l.jpg
How to create a Multiject

  • Shot-boundary detection

  • Spatio-temporal segmentation of within-shot frames

  • Feature extraction (color, texture, edge direction, etc )

  • Modeling

    • Sites: mixture of Gaussians

    • Events: hidden Markov models (HMMs) with observation densities as gaussian mixtures

    • All audio events: modeled using HMMs

    • Each segment is tested for each concept and the information is then composed at frame level

Multiject hierarchical hmm l.jpg
Multiject : Hierarchical HMM

ss1 - ssm : state sequence for supervisor HMM

sa1 - sam : state sequence for audio HMM

xa1 - xam : audio observations

sv1 - svm : state sequence for video HMM

xv1 - xvm : video observations

Multinet concept building based on multiject l.jpg
Multinet: Concept Building based on Multiject

  • A network of multijects modeling interaction between them

  • + / - : positive/negative interaction between multijects

Bayesian multinet l.jpg
Bayesian Multinet

  • Nodes : binary random variables (presence/absence of multiject)

  • Layer 0 : frame-level multiject-based semantic features

  • Layer 1 : inference from layer 0 :

  • Layer 2 : higher level for performance improvement

Object based semantic video modeling l.jpg

Video Sequence

VO Extraction


Video Abstraction

Object-based Low-Level Feature Extraction



Semantic Features Modeling

Object-based SemanticVideo Modeling

Slide17 l.jpg




Motion Projection

Model Update

(Histogram Backprojection)


Object Post-processing


Object Extraction based on Object Tracking [Kim, Hwang ‘00]

Semantic feature modeling l.jpg

Object Features




Abstracted frame sequence

Semantic Feature Modeling

  • Modeling based on temporal variation of object features

  • Boundary shape and motion statistics of object area

Hmm modeling l.jpg





HMM Modeling

1. Observation Sequence

O1 ……. OT





object features

2. Left-Right 1-D HMM modeling

Video modeling three layer structure l.jpg
Video Modeling: Three Layer Structure

Three layer structure of video modeling, compared to NLP

Video Understanding

Natural Language Processing

Content Interpretation



Video Modeling

Frame-based Structural


Object-based Structural Modeling

Sentence Structure & grammar

Word Recognition

Audio-Visual Feature Extraction

A multimodal framework for video content interpretation l.jpg
A Multimodal Framework for Video Content Interpretation

  • Long-term goal

  • Application on automatic TV Programs Scout

  • Allow user to request topic-level programs

  • Integrate multiple modalities: visual, audio and Text information

  • Multi-level concepts

    • Low: low-level feature

    • Mid: object detection, event modeling

    • High: classification result of semantic content

  • Probabilistic model, Using Bayesian network for classification (causal relationship, domain-knowledge)

How to work with the framework l.jpg
How to work with the framework?

  • Preprocessing

    • Story segmentation (shot detection)

    • VOCR, Speech Recognition

    • Key frame selection

  • Feature Extraction

    • Visual features based on key-frame

      • Color, texture, shape, sketch, etc.

    • Audio features

      • average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc.

    • Textual features (Transcript)

      • Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc.

      • Word spotting, vote histogram

    • Motion features

      • Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying

      • Motion trajectories (moving objects)

      • Object abstraction, recognition

  • Building and training the Bayesian network

Challenging points l.jpg
Challenging points

  • Preprocessing is significant in the framework.

    • Accuracy of key-frame selection

    • Accuracy of speech recognition & VOCR

  • Good feature extraction is important for the performance of classification.

  • Modeling semantic video objects and events

  • How to integrate multiple modalities still need to be well considered.

Conclusion l.jpg

  • Introduction of several basic concepts

  • Semantic video modeling and indexing

  • Propose a multimodal framework for topic classification of Video

  • Discussion of Challenging problems

Slide26 l.jpg
Q & A

Thank you!