Content based video analysis based on audiovisual features for knowledge discovery
Download
1 / 59

Content-Based Video Analysis based on Audiovisual Features for Knowledge Discovery - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

Content-Based Video Analysis based on Audiovisual Features for Knowledge Discovery. Chia-Hung Yeh Signal and Image Processing Institute Department of Electrical Engineering University of Southern California. Vision. Parsing or Segmentation. Guidelines. Motivation Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Content-Based Video Analysis based on Audiovisual Features for Knowledge Discovery' - dawn-bryan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Content based video analysis based on audiovisual features for knowledge discovery

Content-Based Video Analysis based on Audiovisual Features for Knowledge Discovery

Chia-Hung Yeh

Signal and Image Processing Institute

Department of Electrical Engineering

University of Southern California


Vision
Vision for Knowledge Discovery

Parsing or Segmentation


Guidelines
Guidelines for Knowledge Discovery

  • Motivation

  • Introduction

  • Overview of visual and audio content

  • Video abstraction

  • Multimodal information concept

  • Knowledge discovery via video mining

  • Our previous work

  • Conclusion and future work


Motivation
Motivation for Knowledge Discovery

  • Amazing growth in the amount of digital video data in recent years.

  • Develop tools for classify, retrieve and abstract video content

  • Develop tools for summarization and abstraction

  • Bridge a gap between low-level features and high-level semantic content

  • To let machine understand video is important and challenging


Why what and how
Why, What and How for Knowledge Discovery

  • Why video content analysis?

    • Modern multimedia technologies have led to huge amount of digital video collections. But, efficient access to video content is still in its infancy, because of its bulky data volume and unstructured data format.

  • What is video content analysis?

    • Video content analysis analyzes the video content and attempts to automatically understand the embedded video semantics as humans do

  • How to do video content analysis?


Overview of visual content
Overview of Visual Content for Knowledge Discovery

  • Structured analysis

    • Extract hierarchical video structure

Key sentences

Sentences

grouped into

Words

segmented into

Text

Document


Overview of audio content
Overview of Audio Content for Knowledge Discovery

  • Continuous in the time domain, not like visual

  • Multiple sound source exists in a sound track like many objects in a single frame

  • It is tough to separate audio content and give a suitable description

  • Framework in MPEG-7, silence, timbre, waveform, spectal, harmonic and fundamental frequency

  • Some special features for music and speech


Content based video indexing
Content-Based Video Indexing for Knowledge Discovery

  • Process of attaching content based labels to video shots

  • Essential for content-based classification and retrieval

  • Some required techniques

    • Shot detection

    • Key frame selection

    • Object segmentation and recognition

    • Visual/audio feature extraction

    • Speech recognition, video text, VOCR


Content b ased video classification
Content- for Knowledge DiscoveryBased Video Classification

  • Segment & classify videos into meaning categories

  • Classify videos based on predefined topic

  • Multimodal concept

    • Visual features

    • Audio features

    • Metadata features

  • Domain-specific knowledge


Query retrieval methods
Query (Retrieval Methods) for Knowledge Discovery

  • Simple visual feature query

  • Feature combination query

  • Query by example (QBE)

    • Retrieve video which is similar to example

  • Localized feature query

    • Example: retrieve video with a running car toward right

  • Object relationship query

  • Concept query (query by keyword)

  • Metadata

    • Time, date and etc.


The ways to browse a video
The Ways to Browse a Video for Knowledge Discovery

  • Playback faster

    • Audio time scale modification – time saving factor 1.5 to 2.5

    • 15% - 20% time reduction by removing and shortening pauses

  • Storyboard

    • Composed of representative still frames (Keyframes)

  • Moving storyboard

    • Display keyframes while synchronized with the original audio track

  • Highlight

    • Pre-defined special event (example: sport and news)

  • Skimming

    • Extract short video clips to build a much shorter video



Image retrieval and video browsing
Image Retrieval and Video Browsing for Knowledge Discovery

  • Query by Image Content (QBIC), IBM, 1995

    • Complex multi-feature and multi-object queries

  • Video browsing

    • Quickly and efficiently Discover the information

    • Browsing and searching are usually complement each other

    • Visual content browsing us easier than audio content

    • Achieved by static storyboard, dynamic video clips, fast forward

  • Representative work

    • Gary Marchionini, University of Maryland

    • S.-F. Chang, Columbia University


Video abstraction
Video Abstraction for Knowledge Discovery

  • Video summarization and video skimming

    • Belong to video abstraction and different from video browsing

    • Automatically retrieve the most significant and most representative a collection of segments

  • Required techniques

    • Shot detection, scene generation

    • Motion analysis

    • Face recognition

    • Audio segmentation

    • Text detection

    • Music detection


Video abstraction1
Video Abstraction for Knowledge Discovery

  • A video abstract

    • A sequence of still or moving images which preserve essential original video content while it is much shorter than the original one

  • Applications

    • Automated authoring of web

      content

      • Web news

      • Web seminar

    • Consumer domain applications

      • Analyzing, filtering, and browsing


Video summarization i
Video Summarization (I) for Knowledge Discovery

  • A collection of salient frames that represent the underlying content

  • Most related work focus on the ways to extract still frame

  • Categorize into three classes

    • Frame-based

      • Randomly or uniformly select

    • Shot-based

      • Keyframe

    • Feature-based

      • Motion, color and so on


Video summarization ii
Video Summarization (II) for Knowledge Discovery

  • Representative work

    • Y. Taniguchi, (1995)

      • Frame-based scheme

      • Simple but may not representative due to not uniform length of shots

    • H.-J. Zhang, Microsoft Research China (1997)

      • Keyframe based on color histogram

    • Gong and Liu, NEC Laboratories of American (2003)

      • SVD (Single Value Decomposition)

      • Capture temporal and spatial characteristics

    • Tseng, Lin and J. R. Smith, IBM T. J. Research Center (2002)

      • Video summarization scheme for pervasive mobile device


Video skimming
Video Skimming for Knowledge Discovery

  • A good skim is much like a movie trailer

  • A synopsis of the entire video

  • Representative work

    • M. Smith and T. Kanade, Carnegie Mellon University (1995)

      • Audio and image characterization

    • S. Pfeiffer, University of Mannheim (1996)

      • VAbstract system

      • Detection of special events such as dialogs, explosions and text occurrences

    • H. Sundaram and S.-F. Chang, Columbia University (2001)

      • A semantics skimming system

      • Visual complexity for human understanding

      • Film syntax


Video skimming application
Video Skimming – Application for Knowledge Discovery

  • Video content transcoding

    • Content-based live sport video filtering


Video shot structure
Video Shot Structure for Knowledge Discovery

  • Shot, a cinematic term, is the smallest addressable video unit (the building block). A shot contains a set of continuously recorded frames

  • Two types of video shots:

    • Camera break abrupt content change between neighboring frames. Usually corresponds to an editing cut

    • Gradual transition  smooth content change over a set of consecutive frames. Usually caused by special effects

  • Shot detection is usually the first step towards video content analysis


Scene characteristics
Scene Characteristics for Knowledge Discovery

  • Scene is a semantic concept which refers to a relatively complete video paragraph with coherent semantic meaning It is subjectively defined

  • Shots within a movie scene have following 3 features

    • Visual similarity

      • Since a scene could only be developed within certain spatial and temporal localities, the directors have to repeat some essential shots to convey parallelism and continuity of activities due to the sequential nature of film making

    • Audio similarity

      • Similar background noises

      • Speeches from the same person have similar acoustic characteristics

    • Time locality

      • Visually similar shots should also be temporally close to each other if they do belong to the same scene


Basic audio features
Basic Audio Features for Knowledge Discovery

  • Energy

    • Silence or pause detection

  • Zero crossing rate (ZCR)

    • The frequency of the audio signal amplitude passing through the zero value in a given time

  • Energy centroid

    • Speech range: 100 Hz to 7k Hz

    • Music range: 16 Hz to 16000 Hz

  • Band periodicity

    • Harmonic sounds

    • Music: High frequency components are integer multiples of the lowest one

    • Speech: Pitch

  • MFCC - (Mel-Frequency Cepstral Coefficients)

    • 13 linearly-spaced filters


Multimodal information concept
Multimodal Information Concept for Knowledge Discovery


Multimodal framework for video content interpretation
Multimodal Framework for Video Content Interpretation for Knowledge Discovery

  • Application on automatic TV Programs abstraction

  • Allow user to request topic-level programs

  • Integrate multiple modalities: visual, audio and text information

  • Multi-level concepts

    • Low: low-level feature

    • Mid: object detection, event modeling

    • High: classification result of semantic content

  • Probabilistic model: using Bayesian network for classification (causal relationship, domain-knowledge)


Probabilistic m odel data fusion
Probabilistic for Knowledge DiscoveryModel – Data Fusion


How to work with the framework
How to Work with the Framework for Knowledge Discovery

  • Preprocessing

    • Video segmentation (shot detection) and key frame selection

    • VOCR, speech recognition

  • Feature Extraction

    • Visual features based on key-frame

      • Color, texture, shape, sketch, etc.

    • Motion features

      • Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying

      • Motion trajectories (moving objects)

      • Object abstraction, recognition

    • Audio features

      • average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc.

    • Textual features (Transcript)

      • Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc.

      • Word spotting, vote histogram

  • Building and training the Bayesian network


Challenging points
Challenging Points for Knowledge Discovery

  • Preprocessing is significant in the framework.

    • Accuracy of key-frame selection

    • Accuracy of speech recognition & VOCR

  • Good feature extraction is important for the performance of classification.

  • Modeling semantic video objects and events

  • How to integrate multiple modalities still need to be well considered


Knowledge discovery via video mining
Knowledge Discovery via Video Mining for Knowledge Discovery

  • Objectives

    • Find the hidden links between isolated news, events, etc.

    • Find the general trend of an event development

    • Predict the possible future event

    • Discover abnormal events

  • Required Technologies

    • Domain-specific knowledge model

    • Mining association rules, sequential patterns and correlations

    • Effective and fast classification and clustering

  • Challenges

    • Model build-up in special knowledge domain

    • Integration of semantic mining and feature-based mining

    • Effective and scalable classification and clustering algorithms


Video mining issues
Video Mining Issues for Knowledge Discovery

  • Frequent/Sequential Pattern Discovery

    • Fast and scalable algorithms for mining frequent, sequential and structured patterns and for correlation analysis

    • Similarity of rule/event search/measurement

  • Efficient and fast classification and clustering algorithms

    • Constraint-based classification and clustering algorithms

    • Spatiotemporal data mining algorithms

    • Stream data mining (classification and clustering) algorithms

  • Surprise/outlier discovery and measurement

    • Detection of outliers based on similarity and trend analysis

    • Detection of outliers and surprised events based on stream data mining algorithms

  • Multidimensional data mining for trend prediction


Framework of video mining
Framework of Video Mining for Knowledge Discovery


Our previous work
Our Previous Work for Knowledge Discovery

  • TV Commercial Detection

    • Visual/audio information processing

  • Cinema rules

    • Intensity mapping

  • Tempo analysis in digital video (Professional video)

    • Audio tempo

    • Motion tempo

  • Home video processing (Non-professional)

    • Quality enhancement (Bad shot detection)

    • Music and video matching


Commercial detection
Commercial Detection for Knowledge Discovery

  • First step to do any TV program content management

  • Monitor broadcast

    • Government

    • Advertisement Company

  • Commercial features

    • Delimiting black frame (not available in some countries)

    • High cut frequency and short shot interval (important feature)

    • Still images

    • Special editing styles and effects

    • Text and logo


Commercial detection1
Commercial Detection for Knowledge Discovery

  • Visual information processing

    • Black frame detection

    • Shot detection & its statistic analysis

    • Still image detection

    • Text-region detection

    • Edge change rate detection

  • Audio information processing

    • Volume control

    • Silence


Commercial detection2
Commercial Detection for Knowledge Discovery

  • Structure of TV program

Normal

program

Spot

Spot

Normal

Program with

Station logo

Normal

program

Black frame

Structure of TV program


Shot detection its statistic analysis
Shot Detection & Its Statistic Analysis for Knowledge Discovery

Commercial

Start point


Still image detection
Still Image Detection for Knowledge Discovery

  • Still Image

    • Video Clip is composed of a sequence of image

    • Find out a set of consecutive images that have little change over a period of time

  • Difficulty

    • Even though we feel that video clip is still, the difference between two consecutive images is seldom zero

    • It is tough to measure the moving part. (human eyes are sensitive to motion)

  • Main idea

    • Quantify motion in each image to detect still image


Still image detection1
Still Image Detection for Knowledge Discovery

Error detection

Really still images


Tempo analysis and cinema rules
Tempo Analysis and Cinema Rules for Knowledge Discovery

  • The visual story - seeing the structure of film, TV, and new media, Bruce Block

    • Relationship between story structure and visual structure

      • Their intensity maps are correlated

    • Principle of contrast and affinity

      • The greater the contrast in a visual component, the more the visual intensity or dynamic increases


Cinema rules

100 for Knowledge Discovery

100

CX

CX

CO

CO

Story

Intensity

Story

Intensity

R

R

EX

EX

0

0

110

110

... …

... …

120

120

0

0

10

10

20

20

Time length of the story in minutes

Time length of the story in minutes

Cinema Rules

  • Every feature film has a well designed story structure, which contains the beginning (exposition), the middle (conflict), and the end (resolution)

EX: exposition  gives the facts needed to begin the story

CO: conflict  contains rising actions or conflict

CX: climax

R: resolution  end the story


Cinema rules1
Cinema Rules for Knowledge Discovery

  • Scene:

    • A simple theme in a scene

    • Each scene is composed of setup part, progressing part, and resolution part

    • Final film is just a way to present this theme

      • Dialog

      • Close-up view

  • A story unit

    • A example of scene

      • Main actors drove the main actress from train station back to home

    • A simple action

      • Met at train station ->On the road->Another main actor joined them -> Arrive home


Audio tempo
Audio Tempo for Knowledge Discovery

  • Music tempo

  • Definition in music

    • Note

    • Meter: A longer period contains many beats. For example, we can count as ONE-two-three, ONE-two-three

    • Tempo (pace/beat period)

      • It is often indicated in the beginning. For example, the rate should be 100 quarter notes per minute (100 times we clap per minute)


Audio tempo1
Audio Tempo for Knowledge Discovery

  • Speech tempo

    • Emotion detection

    • Segmental durations

      • Syllable or phoneme

  • Audio tempo

    • Short time pace

      • Short-term memory

    • The number of sound events per unit of time

      • The more events, the faster it seems to go

    • Onset

      • A new note or a new syllable


Audio tempo2
Audio Tempo for Knowledge Discovery

  • Diagram of audio tempo analysis


Audio tempo3
Audio Tempo for Knowledge Discovery

  • Frequency filterbank

    • Perceptual frequency

    • Critical bands

      • Wavelet-packet

      • Multirate system

  • Envelope extractor

    • Rectify

    • Filtering: 50 ms half-Hamming window

  • Differentiator

    • First-order difference

    • Half-wave rectified

Input signal and detected onsets


Audio tempo4
Audio Tempo for Knowledge Discovery

  • Boundary of story units

    • Local minima of audio tempo

  • Post signal processing

    • Help to get local minima

    • Three steps

      • Lowpass filtering

      • Morphological operation

        • Minmax

        • Close operation

      • Detect local minima

        • Detected valleys

Post processing for audio tempo analysis


Motion analysis
Motion Analysis for Knowledge Discovery

  • The variance of motion vector

    • Where is a window, is the average length of motion vectors for each shot, and is shot index


Motion analysis1
Motion Analysis for Knowledge Discovery

  • Boundary of story units

    • Transition Edges

  • Post processing

    • Morphological operation

      • Median

      • Maxmin

      • Minmax

    • Gradient

    • Detect edges

Post processing for visual tempo


Skimming video
Skimming Video for Knowledge Discovery

  • Test data

    • Legends of The Fall

      • Beginning 26 minutes

      • MPEG format

        • 352*240 pixels

        • 44.1 KHz


Home video processing

Shooting tips for Knowledge Discovery

1

Shoot lots of short scenes (5 ~ 10 seconds)

2

Use zoom in/out to take exposition shots or emphasize something

3

Zoom or pan slowly

4

Get a lot of face shots

5

Keep a steady hand

6

Make sure your subject is well lit

Home Video Processing

  • Home video characteristics

    • Fragmental

    • Sound may not be very important

    • Bad shots

      • Stabilization

      • Focus

      • Lighting


Bad shots
Bad Shots for Knowledge Discovery

  • Shaky

    • Drive

    • Walk

  • Vibration of the camera motions of successive frames


Bad shots1
Bad Shots for Knowledge Discovery

  • Ill-light

    • Too dark/bright

    • Variance too much

      • Diaphragm

  • Lighting Problem

    • Average of luminance

      • Highest 1/3 pixels and lowest 1/3 pixels

      • Negative feedback


Bad shots2
Bad Shots for Knowledge Discovery

  • Blur

    • Motion blur

    • Out-of-focus blur

    • Foggy blur


Music and video matching
Music and Video Matching for Knowledge Discovery

  • Shot detection

  • Remove bad shots

  • Match music tempo

    • Shot length

    • Motion activity


Authoring scheme
Authoring Scheme for Knowledge Discovery

  • Match music tempo

    • High tempo

      • Small segment length

        • Transition time

      • High motion activity


Experimental results
Experimental Results for Knowledge Discovery

  • Test data

    • Input music: 5.5-minutes music, Canon

    • Input video clips:

      • Activities of babies of 0 ~ 3 years old

      • Man-made bad shots

      • Average clip length is about 20 seconds

      • Total length is 50 minutes


Well known research in video content analysis field
Well-Known Research in Video Content Analysis Field for Knowledge Discovery

  • Well-known university

    • Digital Video Multimedia laboratory (DVMM), Columbia University

    • MIT Media laboratory

    • Information Digital Video Understanding, Carnegie Mellon University

    • Department of Electrical and Computer Engineering, University of Illinois of Urbana-Champaign

    • Signal and Image Processing Institute, University of Southern California

    • Department of Electrical Engineering, Princeton University

    • Language and media processing laboratory, University of Maryland


Well known research in video content analysis field1
Well-Known Research in Video Content Analysis Field for Knowledge Discovery

  • Well-known R&D laboratory

    • IBM T. J. Watson research center

    • IBM Almaden research center

    • Intel corporation

    • Sharp Laboratory of America (SLA)

    • Microsoft research laboratory

    • Microsoft research China

    • Hawlett-Packard research laboratory

    • AT&T Bell laboratory

    • InterVideo

    • Pinnacle


Conclusion
Conclusion for Knowledge Discovery

  • Introduction of several basic concepts

  • Basic processing and low-level feature extraction

  • Semantic video modeling and indexing

  • Multimodal framework for topic classification of Video

  • Knowledge discovery via video mining

  • Our research results

  • Discussion of Challenging problems


Questions
Questions for Knowledge Discovery

Thank You


ad