Improved Video Categorization from Text Metadata and User
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Improved Video Categorization from Text Metadata and User Comments PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011: Research and development in Information Retrieval - Katja Filippova - Keith B. Hall . Presenter Viraja Sameera Bandhakavi. 1. Contributions.

Download Presentation

Improved Video Categorization from Text Metadata and User Comments

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Improved video categorization from text metadata and user comments

Improved Video Categorization from Text Metadata and User Comments

ACM SIGIR 2011:Research and development in Information Retrieval

- KatjaFilippova

- Keith B. Hall

  • PresenterVirajaSameeraBandhakavi

1


Improved video categorization from text metadata and user comments

Contributions

  • Analyze sources of text information like title, description, comments, etc and show that they provide valuable indications to the topic

  • Show that a text based classifier trained on imperfect predictions of weakly supervised video content-based classifier is not redundant

  • Demonstrate that a simple model combining the predictions of the two classifiers outperforms each of them taken independently

2


Research question not answered by related work

Research question not answered by related work

  • Can a classifier learn from imperfect predictions of a weakly supervised classifier?Is the accuracy comparable to the original one? Can a combination of two classifiers outperform either one?

  • Do the video and text based classifiers capture different semantics?

  • How useful is user provided text metadata? Which source is the most helpful?

  • Can reliable predictions be made from user comments? Can it improve the performance of the classifier?

3


Methodology

Methodology

  • Builds on top of the predictions of Video2Text

  • Uses Video2Text:

    • Requires no labeled data other than video metadata

    • Clusters similar videos and generates a text label for each cluster

    • The resulting label set is larger and better suited for categorization of video content on YouTube

4


Video2text

Video2Text

  • Starts from a set of weak labels based on the video metadata

  • Creates a vocabulary of concepts (unigrams or bigrams from the video metadata)

  • Every concept is associated with a binary classifier trained from a large set of audio and video signals

  • Positive instances- videos that mention the concept in the metadata

  • Negative instances-videos which don’t mention the concept in the metadata

5


Procedure

Procedure

  • Binary classifier is trained for every concept in the vocabulary

    • Accuracy is assessed on a portion of a validation dataset

    • Each iteration uses a subset of unseen videos from the validation set

    • The classifier and concept are retained if precision and recall are above a threshold (0.7 in this paper)

  • The remaining classifiers are used to update the feature vectors of all videos

  • Repeated until the vocabulary size doesn’t change much or the maximum number of iterations is reached

  • Finer grained concepts are learned from concepts added in the previous iteration

  • Group together labels related to news, sports, film, etc resulting in the final set of 75 two level categories

6


Categorization with video2text

Categorization with Video2Text

  • Use Video2Text to assign two-level categories to videos

  • Total number of binary classifiers (hence labels) limited to 75

  • Output of Video2Text represented as a list of strings: (vi , cj,sij, )

7


Distributed maxent

Distributed MaxEnt

  • Approach automatically generates training examples for the category classifier

  • Uses conditional maximum entropy optimization criteria to train the classifiers

  • Results in a conditional probability model over the classes given the YouTube videos.

8


Data and models

Data and Models

  • Text models differ regarding the text sources from which the features are extracted: title, description, comments, etc

  • Features used are all token based

  • Infrequent tokens are filtered out to reduce feature space

  • Token frequencies are calculated over 150K videos

  • Every unique token is counted onceper video

  • Threshold token frequency of 10 is used

  • Tokens are prefixed with the first letter of where it was found

  • eg: T:xbox, D:xbox, U:xbox, C:xbox, etc

9


Combined classifier

Combined Classifier

  • Used to see if the combination of the two views – video and text based, is beneficial

  • A simple meta classifier is used, which ranks the video categories based on predictions of the two classifiers

  • Video based predictions are converted to a probability distribution

  • The distribution from the video based prediction and from MaxEnt(Maximum Entropy classifier) are multiplied

  • This approach proved to be effective

  • Idea: Each classifier has a veto power

  • The final prediction for each video is the one with the highest product score

10


Experiments evaluation of text models

Experiments- Evaluation of Text Models

  • Training data set containing 100K videos which get high scoring prediction

  • Correct prediction – score of at least 0.85 from Video2Text

  • Text based prediction must be in the set of video-assigned categories

  • Evaluation was done on two sets of videos:

    • Videos with at least one comment

    • Videos with at least 10 comments

11


Experiments evaluation of text models contd

Experiments- Evaluation of Text Models Contd…

  • The best model is TDU+YT+C for both sets

  • This model is used for comparison against Video2Text model with human raters

  • This model is also used in the Combination model

12


Experiments with human raters

Experiments with Human Raters

  • Total of 750 videos are extracted equally from the 15 YouTube categories

  • Human rater rates (video, category) as -fully correct (3), partially correct(2), somewhat related(1) or off topic (0)

  • Every pair received from 3 human raters

  • The three ratings are summed and normalized (by dividing by 9) and rounded off to get the resultant score

13


Experiments with human raters contd

Experiments with Human Raters Contd…

  • Score of at least 0.5 – correct category

  • Text based model performs significantly better than video model

  • Combination model improved accuracy

  • Accuracy of all models increases with number of comments

14


Conclusion

Conclusion

  • Text based approach for assigning categories to videos

  • Competitive classifier trained on high-scoring predictions made by a weakly supervised classifier (video features)

  • Text and video models provide complementary views on the data

  • Simple combination model outperforms each model on its own

  • Accurate predictions from user comments

  • Reasons for impact of comments:

    • Substitute for a proper title

    • Disambiguate the category

    • Help correct wrong predictions

  • Future work: Investigate usefulness of user comments for other tasks

15


  • Login