1 / 59

Video Google – A google approach to Video Retrieval

Video Google – A google approach to Video Retrieval. Introduction. Problem: Retrieve key frames and shots that of a video containing a particular object or scene with the ease and accuracy of Google. Approach: Effectively precompute matches Textual analogy. Architecture. Visual Word.

Download Presentation

Video Google – A google approach to Video Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Video Google – A google approach to Video Retrieval

  2. Introduction • Problem: • Retrieve key frames and shots that of a video containing a particular object or scene with the ease and accuracy of Google. • Approach: • Effectively precompute matches • Textual analogy

  3. Architecture Visual Word User-End Storage Indexing

  4. Dhruvan Dileep Nishant Pradeep Pramod Sunil Video Google –Visual words

  5. MSER Maximally Stable Extremal Regions A Maximally Stable Extremal Region (MSER) is a connected component of an appropriately thresholded image

  6. SA The Shape Adapted regions are invariant to affine transformations. The SA regions tend to be centered on corner like features.

  7. SIFT Scale Invariant Feature Transform Invariant to image scaling and rotation Partially invariant to changes in illumination and viewpoint 128 dimensional descriptor

  8. Clustering • Aim : To vector quantize descriptors into clusters to be used as Visual words • Clustering Techniques • Agglomerative • O(n2) space. • Kmeans • O(n+k) space, O(n*k*e) time complexity • Fast Kmeans • Triangulation inequality used. • O(n*k) space. • Distance calculations reduced to ~ n than n*k*e

  9. Statistics • 19 Half an hour videos: • Classification 1060842 points – 9 hours

  10. Clustering Evaluation

  11. DB and API Indexing/Retrieval Visual Words UI

  12. Results

  13. Future Work • Vocabulary Tree for interest point classification • Increase the visual vocabulary through efficient clustering.

  14. Indexing and Retrieval in Vgoogle D Pavan Kumar B Rakesh Babu B Naveen Kumar Ankur Jaiswal V Sreekanth P Kowshik J Shashank

  15. Overview Visual Words Indexing Results Query

  16. Input format Pre-processing Query Set of visual words in the query rectangle

  17. Output format Retrieved Results

  18. Objectives Efficient Indexing Fast Retrieval Time Good Recall

  19. Approach … Removing the common words Reverse Indexing Ranking of results

  20. Indexing and Retrieval in Document Retrieval Stop list Used to remove the common words. Inverse File Structure An entry for each word in the corpus followed by a list of all the documents in which it appears. Spatial Consistency Ranking Use the ordering and separation of words to calculate the relevance of a document.

  21. Stop list In textual context Words are extracted from text. Words are filtered based on the level of usefulness. For instance words which are independent of subject or event being described are filtered out. Removing such words will have no effect on the results. E.g.: The way the school is long and hard when walking in the rain. Removing `the` will have no effect on the result.

  22. Stop list (contd.) In the current context Stop list - list of visual words. Occur very often or very less. Determine stop list boundaries empirically. Advantages Reduce number of mismatches Reduce size of inverted file Meaningful visual vocabulary

  23. Stop list (contd…)

  24. Inverse File Structure Inverted File structure for Indexing Popular DS in Document Retrieval Mapping from words to Document Less query time compared to Forward indexing Forward Indexing – Sequential Inverted Indexing – Random

  25. Words D1051 Movie D3 D23 D25 D1 Spain D1 D3 D8 ……. D2029 Table D2 D8 D100 ……. ……. ……. ……. ……. D12 D1078 D102 D25 Song

  26. Visual Analogy Words ~ Visual Words Documents ~ Frames Query vector ~ visual words in Sub-Part of frame

  27. Visual words D1051 V1 D3 D23 D25 D1 V2 D1 D3 D8 ……. D2029 V3 D2 D8 D100 ……. ……. ……. ……. ……. D12 D1078 D102 D25 Vn

  28. Ranking the results - tf-idf Document – vector of word frequencies Each component of the vector is given some weight Standard Weighting Method TF-IDF

  29. Each document is represented as a vector < t1, t2, t3, … ti,…, tk-1, tk > nid - number of occurrences of ith word in document d. nd - total number of words in document d. ni - number of occurrences of ith visual word in whole database. N - number of documents in the whole database IDF – down weights most frequent words Ranked by cosine of angle between query vector and all document vectors. Ranking the results - tf-idf

  30. Ranking the results – Spatial Consistency “Google increases the probability of documents having all the search words close to one another" Thereit is. That’s what I….. been.... have …. I have been thereonce , while ……..

  31. Spatial Consistency Ranking Spatial arrangement of objects in images. Spatial consistency measure - Re-rank the results Neighboring matches in the query region lie in a surrounding area in the retrieved image.

  32. Spatial Consistency Ranking Search area is defined by 15 nearest neighbors. A neighbor in the surrounding area in the retrieved image counts as a vote. Match with no support / hits is rejected. Repeat this for every match. Total number of votes decides the rank.

  33. V V Number of votes = 3

  34. Frame 1 Frame 2 Frame 3 Frame N Visual words 4 V1 10 7 0 8 V2 4 3 0 ……. 14 V3 9 0 2 ……. ……. ……. ……. ……. 8 Vn 0 2 0 57 36 23 4

  35. Initial Match After Stoplist After Spatial Consistency

  36. Future Work More efficient implementation of spatial consistency. Improve the retrieval time.

  37. USER INTERFACE Chetan Chhaya Nishant Revanth Sandeep Sheetal

  38. Objective Build a web interface for retrieving shots from news video database which matches the given image query Display the ranked list of shots eg Date, Channel, Maximum match, Month

  39. Input & Output

  40. About The Interface… The interface constitutes of the following three parts. Database Schema Data Directories Source Code Files

  41. Database Schema All the videos and metadata corresponding to the videos is stored in SQL database which can be queried using MySQL. Following two tables used: Table1 Table 2

  42. Data Directories Contains following five directories where data is stored Thumbnails Keyframes Stories Shots videos

  43. Source Files The interface part consists of 8 files. index.cgi server.cgi shots.cgi keyframes.cgi SelectRect.js display.cgi play.cgi conf.py Each file is a module.

  44. index.cgi Home page of the Interface. This page lists todays videos as thumbnail of first keyframe corresponding to the first shot of the video. It also gives the user option to select specific videos based upon the criterias of date and channel through comboboxes.

  45. Server.cgi User can be directed to this page from any of the pages since all give the user choice to select from the combo boxes. This page lists the results of the user selection from the comboboxes(based upon the criterias of date and channels) The displayed result shows the thumbnail of first keyframe of each video.

  46. shots.cgi Page used to display the shots of the video selected from the previous page. The constituting stories of the videos are displayed on the screen one after another. Corresponding to each story ,we display the thumbnail of the keyframe of all the shots in that particular story.

More Related