Video Google – A google approach to Video Retrieval. Introduction. Problem: Retrieve key frames and shots that of a video containing a particular object or scene with the ease and accuracy of Google. Approach: Effectively precompute matches Textual analogy. Architecture. Visual Word.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Introduction • Problem: • Retrieve key frames and shots that of a video containing a particular object or scene with the ease and accuracy of Google. • Approach: • Effectively precompute matches • Textual analogy
Architecture Visual Word User-End Storage Indexing
Dhruvan Dileep Nishant Pradeep Pramod Sunil Video Google –Visual words
MSER Maximally Stable Extremal Regions A Maximally Stable Extremal Region (MSER) is a connected component of an appropriately thresholded image
SA The Shape Adapted regions are invariant to affine transformations. The SA regions tend to be centered on corner like features.
SIFT Scale Invariant Feature Transform Invariant to image scaling and rotation Partially invariant to changes in illumination and viewpoint 128 dimensional descriptor
Clustering • Aim : To vector quantize descriptors into clusters to be used as Visual words • Clustering Techniques • Agglomerative • O(n2) space. • Kmeans • O(n+k) space, O(n*k*e) time complexity • Fast Kmeans • Triangulation inequality used. • O(n*k) space. • Distance calculations reduced to ~ n than n*k*e
Statistics • 19 Half an hour videos: • Classification 1060842 points – 9 hours
DB and API Indexing/Retrieval Visual Words UI
Future Work • Vocabulary Tree for interest point classification • Increase the visual vocabulary through efficient clustering.
Indexing and Retrieval in Vgoogle D Pavan Kumar B Rakesh Babu B Naveen Kumar Ankur Jaiswal V Sreekanth P Kowshik J Shashank
Overview Visual Words Indexing Results Query
Input format Pre-processing Query Set of visual words in the query rectangle
Output format Retrieved Results
Objectives Efficient Indexing Fast Retrieval Time Good Recall
Approach … Removing the common words Reverse Indexing Ranking of results
Indexing and Retrieval in Document Retrieval Stop list Used to remove the common words. Inverse File Structure An entry for each word in the corpus followed by a list of all the documents in which it appears. Spatial Consistency Ranking Use the ordering and separation of words to calculate the relevance of a document.
Stop list In textual context Words are extracted from text. Words are filtered based on the level of usefulness. For instance words which are independent of subject or event being described are filtered out. Removing such words will have no effect on the results. E.g.: The way the school is long and hard when walking in the rain. Removing `the` will have no effect on the result.
Stop list (contd.) In the current context Stop list - list of visual words. Occur very often or very less. Determine stop list boundaries empirically. Advantages Reduce number of mismatches Reduce size of inverted file Meaningful visual vocabulary
Inverse File Structure Inverted File structure for Indexing Popular DS in Document Retrieval Mapping from words to Document Less query time compared to Forward indexing Forward Indexing – Sequential Inverted Indexing – Random
Words D1051 Movie D3 D23 D25 D1 Spain D1 D3 D8 ……. D2029 Table D2 D8 D100 ……. ……. ……. ……. ……. D12 D1078 D102 D25 Song
Visual Analogy Words ~ Visual Words Documents ~ Frames Query vector ~ visual words in Sub-Part of frame
Visual words D1051 V1 D3 D23 D25 D1 V2 D1 D3 D8 ……. D2029 V3 D2 D8 D100 ……. ……. ……. ……. ……. D12 D1078 D102 D25 Vn
Ranking the results - tf-idf Document – vector of word frequencies Each component of the vector is given some weight Standard Weighting Method TF-IDF
Each document is represented as a vector < t1, t2, t3, … ti,…, tk-1, tk > nid - number of occurrences of ith word in document d. nd - total number of words in document d. ni - number of occurrences of ith visual word in whole database. N - number of documents in the whole database IDF – down weights most frequent words Ranked by cosine of angle between query vector and all document vectors. Ranking the results - tf-idf
Ranking the results – Spatial Consistency “Google increases the probability of documents having all the search words close to one another" Thereit is. That’s what I….. been.... have …. I have been thereonce , while ……..
Spatial Consistency Ranking Spatial arrangement of objects in images. Spatial consistency measure - Re-rank the results Neighboring matches in the query region lie in a surrounding area in the retrieved image.
Spatial Consistency Ranking Search area is defined by 15 nearest neighbors. A neighbor in the surrounding area in the retrieved image counts as a vote. Match with no support / hits is rejected. Repeat this for every match. Total number of votes decides the rank.
V V Number of votes = 3
Frame 1 Frame 2 Frame 3 Frame N Visual words 4 V1 10 7 0 8 V2 4 3 0 ……. 14 V3 9 0 2 ……. ……. ……. ……. ……. 8 Vn 0 2 0 57 36 23 4
Initial Match After Stoplist After Spatial Consistency
Future Work More efficient implementation of spatial consistency. Improve the retrieval time.
USER INTERFACE Chetan Chhaya Nishant Revanth Sandeep Sheetal
Objective Build a web interface for retrieving shots from news video database which matches the given image query Display the ranked list of shots eg Date, Channel, Maximum match, Month
About The Interface… The interface constitutes of the following three parts. Database Schema Data Directories Source Code Files
Database Schema All the videos and metadata corresponding to the videos is stored in SQL database which can be queried using MySQL. Following two tables used: Table1 Table 2
Data Directories Contains following five directories where data is stored Thumbnails Keyframes Stories Shots videos
Source Files The interface part consists of 8 files. index.cgi server.cgi shots.cgi keyframes.cgi SelectRect.js display.cgi play.cgi conf.py Each file is a module.
index.cgi Home page of the Interface. This page lists todays videos as thumbnail of first keyframe corresponding to the first shot of the video. It also gives the user option to select specific videos based upon the criterias of date and channel through comboboxes.
Server.cgi User can be directed to this page from any of the pages since all give the user choice to select from the combo boxes. This page lists the results of the user selection from the comboboxes(based upon the criterias of date and channels) The displayed result shows the thumbnail of first keyframe of each video.
shots.cgi Page used to display the shots of the video selected from the previous page. The constituting stories of the videos are displayed on the screen one after another. Corresponding to each story ,we display the thumbnail of the keyframe of all the shots in that particular story.