Video Retrieval. Topics. Shot detection algorithm Video indexing key frame-based video indexing Adaptive video indexing technique Automatic relevance feedback network for video retrieval Experiment on iARM: video search engine. Video Data.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Topics • Shot detection algorithm • Video indexing • key frame-based video indexing • Adaptive video indexing technique • Automatic relevance feedback network for video retrieval • Experiment on iARM: video search engine
Video Data • Video is a continuous media but for database storage and manipulation such as random access it is important to be able to deal with portions of video object. • Video Segmentation―cutting long video into portions: shot, scene, and clip • Shot define a low level syntactic building blocks of video sequence. • Scene is the logical grouping of shots into semantic unit. • Clip is not clearly defined so it can last from a few seconds to several hours.
Shot Boundaries • Shot boundary detection can be easy, or difficult, depending • cut: hard boundary complete change of shot between consecutive frames • fade: fade-out or fade-in, a gradual fade to/from completely back (white?) frame • dissolve: simultaneous fade-out and fade-in • …..others • Each of these post-production technique make the detection of shot boundaries more difficult.
Cut Fade in Fade out Dissolve
Frame-to-frame comparison is the color histogram of the j-th frame
Key-Frame for Shot Representation key-frame key-frame
Content Representation • Content of video shot is describe by a low-level feature (e.g., color histogram) of the corresponding key-frame. • The m-th video shot is indexed by Key-Frame Content Descriptor Video Shot
Querying Video Database Video Shot 1 Video Shot 2 Video Shot 3 Query Shot Matching Video Shot J Database
GUI for Key Frame-Based Video Retrieval Query Shot Play shot
Problems • Compared to an image, video data contains both spatial and temporal information • Key frame-based video indexing (KFVI) method can deal with spatial content but does not take into account temporal information. • Furthermore, KFVI is not well adapted for representing video at scene and story levels
Adaptive Video Indexing (AVI) Technique • A better technique in capturing temporal content as well as the spatial content for effective video indexing • AVI provide multiple access to video database at three levels: • shot • group-of-shot • Story
Video shot database group of shots story Database Organization based on AVI Multiple-level access to video database Where is the descriptor of the video interval
Fundamental of AVI • Video sequence is a collection of visual templates (i.e., image frame) • Similar video contains use similar visual templates V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9
Fundamental of AVI V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9 Descriptor of shot 1: [0 0 2 0 3 0 0 2 0 ...] Descriptor of shot 2: [0 0 0 0 3 2 0 0 0 ...] V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9
Template Generation • Given a set of initial visual templates, and training vectors • The templates are optimized through the following steps: • Randomly choose the input vector • If is the closest node to such that • Then,
Template-frequency modeling (TFM) • Let be a set of descriptors for the video interval I, where is the histogram corresponding to the video frame • Each is mapped to a Voronoi space through where and is the label of the n-th cell neighboring to the best match cell,
TFM Cont. • The resulting of all frames from the mapping of the entire video interval are used as a representation of the video through a weight scheme: where is the number of times the template is mentioned in the content of the video , N denotes the total number of videos in the system, and denotes the number of videos in which the index template appears.
Test Data Description of sequences in the database: CNN broadcast news (at 352 resolution and 30 frames/sec.)
Retrieval Results (b) (a) A comparison of the retrieval performance at the shot level; (a) obtained by KFVI; and (b) obtained by the AVI
Performance Comparison Precision results averaged over 25 queries, compared between adaptive video indexing (AVI) and key-frame based video indexing (KFVI), using video database containing 844 video shots
Query-by-Video-Clip (b) (a) • Precision and recall rates obtained by retrieval of: • video groups, employing two links: shot-to-group (STG) and group-to-group (GTG) • video story, employing two links: shot-to-story (STS) and group-to-story (GTS)
Query-by-Video-Clip Cont.. (a) Query clip, <1.8 sec> (b) Rank 1, <1.8 sec> (c) Rank 2, <2.4 sec> (d) Rank 3, <1.9 sec> (e) Rank 4, <2.7 sec> (f) Rank 5, <3.3 sec>
Relevance Feedback for Video Retrieval: A client-server architecture Search Engine with Relevance Feedback
Problem with Relevance Feedback (RF) • user have to play each retrieved video in a feedback cycle • compared to an image, video files are usually very large • time consuming • high bandwidth in RF training process
Automatic and Semi-Automatic RFs Search Engine with Automatic Relevance Feedback Network
Automatic Relevance Feedback Network (ARFN) • Goal: implementation of adaptive system to improve retrieval accuracy • Strategy: incorporate self-learning neural network in the relevance feedback module in order to avoid user’s interaction during the retrieval process
ARFN Architecture number of nodes in the second layer = number of visual templates number of nodes in the third layer = number of video in the database
Signal Propagation (a) (b) (c) (a) Forward propagation; (b) Backward propagation; (c) New video template nodes in (b) introduce a new video node. This process results in the activation of new video nodes by expanding the original query templates, analogous to the traditional relevance feedback technique
Signal Propagation Cont.. • Activation level at the video template nodes, can be calculate according to two criterion: • Positive feedback • Positive and negative feedback where is the activation of the j-th video node, Pos is the set of positive video nodes, Neg is the set of negative video nodes
Results Average Precision Rate, APR (%) obtained by retrieving 25 video shot queries. ARFN results are quoted relative to the APR observed with simple retrieval.
Goals • Setting video search engine at the shot level, using JSP and J2EE server • Implementing video indexing using AVI and compared it with KFVI • Implementing a simple user-controlled interactive retrieval method within the search engine
GUI in iARM search engine Query Shot Selected method
Step I • Copy all the files in the folder “Experiments” to drive C: • Feature Database • key-frame Database Jsp file and Java Beans Video Shot Database
Step II: load feature vectors to database >> java COM.cloudscape.tools.cview Open the video feature database “C:\Experiments\database\video
Step III: deploy application deploytool New Application
Add Web components • to the Application • index.jsp • autoFeedback.class • CompType.class • MyDateJose.class • MyLocalRbf.class • userData.class
Open the search engine: “http://localhost:8000/iARM/index.jsp”