1 / 61

Hyeonsoo , Kang

Video Repeat Recognition and Mining by Visual Features. Hyeonsoo , Kang. ▫ Introduction. ▫ Structure of the algorithm. Known Video Repeat Recognition Unknown Video Repeat Recognition. ▫ Results. Known Video Repeat Recognition Unknown Video Repeat Recognition. “repeat r ecognition?”.

emelda
Download Presentation

Hyeonsoo , Kang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Repeat Recognition and Mining by Visual Features Hyeonsoo, Kang

  2. ▫ Introduction ▫ Structure of the algorithm Known Video Repeat Recognition Unknown Video Repeat Recognition ▫ Results Known Video Repeat Recognition Unknown Video Repeat Recognition

  3. “repeat recognition?”

  4. “repeat recognition?” Video repeats which refer to copies of a video clip ubiquitously exist in broadcast and web videos, such as TV commercials, station logo, or program logo, etc.

  5. Important for video content analysis and retrieval. Applications: Video syntactical segmentation, commercial monitoring, video copy detection, web video multiplicity estimation, video content summary, personalization, video compression, … Benefit Distortions – partial repeats, caption overlay Challenge Robust detection, searching efficiency, and also learning issue

  6. So what exactly are we going to do?

  7. Observations? CNN news shots

  8. Observations?

  9. Observations?

  10. Observations?

  11. Observations?

  12. Observations?

  13. TIME AXIS

  14. Video repeat recognition approaches are chiefly twofold: (a) Known video repeat recognition (b) Unknown video repeat recognition

  15. (a) Known video repeat recognition Prior knowledge about video repeats are known construct a feature vector set and use nearest neighbor (NN) classifier to recognize copies of prototype videos. (b) Unknown video repeat recognition Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

  16. (a) Known video repeat recognition Prior knowledge about video repeats are known construct a feature vector set and use nearest neighbor (NN) classifier to recognize copies of prototype videos. (b) Unknown video repeat recognition Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

  17. (a) Known video repeat recognition Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calculated – R,G,B channels divided into 8 bins each, texture is computed as 13 components. 2. Cluster each color, and texture space separately, hence we get

  18. (a) Known video repeat recognition Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calculated – R,G,B channels divided into 8 bins each, texture is computed as 13 components. 2. Cluster each color, and texture space separately, hence we get Wait, but how big is the computation then?

  19. (a) Known video repeat recognition Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calculated – R,G,B channels divided into 8 bins each, texture is computed as 13 components. 2. Cluster each color, and texture space separately, hence we get • Wait, but how big is the computation then? • We have to consider both 8 X 8 X 8 X S & 13 X U features …

  20. (a) Known video repeat recognition Video Feature Extraction: frames are sampled every half second. For each sample frame, RGB color histogram and texture feature are calculated – R,G,B channels divided into 8 bins each, texture is computed as 13 components. 2. Cluster each color, and texture space separately, hence we get We don’t want to match videos in this massive space. Rather, we want to project the space into a smaller subset • Wait, but how big is the computation then? • We have to consider both 8 X 8 X 8 X S & 13 X U features …

  21. (a) Known video repeat recognition 3. Use OPCA (maximize the Rayleigh quotient) And assume that, Then an eigenvector problem,

  22. (a) Known video repeat recognition 4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if the distance (difference) is below a threshold .

  23. (a) Known video repeat recognition 4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if the distance (difference) is below a threshold . Otherwise the test video does not belong to any prototype video. In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database!

  24. (a) Known video repeat recognition 4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if the distance (difference) is below a threshold . Otherwise the test video does not belong to any prototype video. In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database! We first define three types of distance, , and

  25. (a) Known video repeat recognition 4. Once we got a smaller subspace, we now want to recognize video copies. We will use NN classifier to efficiently analyze. A test video is recognized as the closest prototype video if the distance (difference) is below a threshold . Otherwise the test video does not belong to any prototype video. In order to determine the threshold value , we need to analyze the video database. And remember, this is a known video repeat recognition problem, hence we know the statistical data about the database! We first define three types of distance, , and

  26. (a) Known video repeat recognition , and : Within-class distance between distorted prototype videos and model database. , : The minimum between-class distance between distorted prototype video and the database. : The minimum distance between non-prototype video and the database

  27. (a) Known video repeat recognition These distance types are useful because … Distorted prototype video is classified as non-prototype video Distorted copy of one prototype video is recognized as another prototype video Non-prototype video is recognized as a prototype video. Are the only cases of recognition errors.

  28. (a) Known video repeat recognition These distance types are useful because … Distorted prototype video is classified as non-prototype video Distorted copy of one prototype video is recognized as another prototype video Non-prototype video is recognized as a prototype video. Are the only cases of recognition errors. Therefore the probability that a video q be wrongly classified is …

  29. (a) Known video repeat recognition The density functions of , and are respectively.

  30. [Continued] (a) Known video repeat recognition Our little old Math knowledge says that we need to differentiate the C1 function once in order to achieve the minimum (maximum) From the function’s property

  31. Experiments and Results A prototype video database which consists of 1000 short video clips with length from 15 to 90s, most of which are commercials and film trailers. [Video format] - frame size 720x576, 25fps. - Distorted copies obtained by downsizing to 352x288, with frame rate reduction from 25fps to 15fps (This is common distortion lying between broadcast video and web video copies) [Video length] - Set as 10s when computing the feature vectors. [The number of clusters] - texture feature clusters: 5 - color feature clusters: 1 Then OPCA is adopted to compute the 64 subspace projections

  32. Experiments and Results Statistical Analysis of the subspace …

  33. [Continued] Experiments and Results Histograms of , and Density functions of , and

  34. [Continued] Experiments and Results Histograms of , and Density functions of , and So we set the threshold of 0.05 (or less is also possible according to the equation)

  35. [Continued] Experiments and Results Minimum training error rates

  36. (a) Known video repeat recognition Prior knowledge about video repeats are known construct a feature vector set and use nearest neighbor (NN) classifier to recognize copies of prototype videos. (b) Unknown video repeat recognition Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

  37. (a) Known video repeat recognition Prior knowledge about video repeats are known construct a feature vector set and use nearest neighbor (NN) classifier to recognize copies of prototype videos. (b) Unknown video repeat recognition Prior knowledge about video repeats are unknown detection, search, and learning issues arise.

  38. (b) Unknown video repeat recognition • Big picture: We employ two cascade detectors. This is unknown video repeat recognition problem, so we need to give an algorithm to the machine to recognize repeats. • Again, we’ll use visual properties, here we’ll employ color fingerprint (Yang et. al [ ]) • The first detector discovers potential repeated clips, and the second one improves accuracy.

  39. [Continued] (b) Unknown video repeat recognition FIRST Stage SECOND Stage

  40. (b) Unknown video repeat recognition … KF 1 VU1 KF2 VU2 KF3 VU3 KF4 VU4 Wait, but how do we find keyframes?

  41. [Continued] (b) Unknown video repeat recognition … KF 1 VU1 KF2 VU2 KF3 VU3 KF4 VU4 • Wait, but how do we find keyframes? • Keyframe selection is based on color histogram difference. • Suppose H1 and H0 are color histograms of current frame and last keyframerespectively, then current frame is selected as new keyframe if the following condition is satisfied,

  42. [Continued] (b) Unknown video repeat recognition … KF 1 VU1 KF2 VU2 KF3 VU3 KF4 VU4

  43. [Continued] (b) Unknown video repeat recognition And we average the K blending images – Color fingerprint is the ordered catenation of these block features. … KF 1 VU1 KF2 VU2 KF3 VU3 KF4 VU4

  44. [Continued] (b) Unknown video repeat recognition And we average the K blending images – Color fingerprint is the ordered catenation of these block features. Let R, G, B the average color values of a block, and their descending order is (V1, V2, V3), then the major color and minor color are determined by the following rules:

  45. [Continued] (b) Unknown video repeat recognition And we average the K blending images – Color fingerprint is the ordered catenation of these block features. Let R, G, B the average color values of a block, and their descending order is (V1, V2, V3), then the major color and minor color are determined by the following rules:

  46. [Continued] (b) Unknown video repeat recognition If we divided the blending images into 8 X 8 blocks (M = N = 8) then the color feature is a 128 dimensional symbol vector! To decrease the complexity of searching, we transform the data into a string representation using LSH (Local Sensitive Hashing) and use unit length filtering. Now, the actual algorithm for the machine to recognize the repeats, we devised a similarity measure.

  47. [Continued] (b) Unknown video repeat recognition If we divided the blending images into 8 X 8 blocks (M = N = 8) then the color feature is a 128 dimensional symbol vector! To decrease the complexity of searching, we transform the data into a string representation using LSH (Local Sensitive Hashing) and use unit length filtering. Now, the actual algorithm for the machine to recognize the repeats, we devised a similarity measure.

  48. [Continued] (b) Unknown video repeat recognition Given two video units vu_i and vu_j, their difference is defined as: Where F_i, and F_j are color fingerprint vectors of vu_i and vu_j, d(F_i, F_j) is color fingerprint distance function, len(*) is length feature.

  49. [Continued] (b) Unknown video repeat recognition Second stage matching was conducted, we decide whether the repeat pair from the first stage is true or not from the following condition: Score is the similarity value, L is the minimum length of the two clips in seconds, and is threshold. Once a repeat pair is verified, their boundaries are extended until a dissimilar one encountered. Here we use a soft threshold!

More Related