1 / 6

Classifying Movie Scripts by Genre

Alex Blackstock Matt Spitz 6/9/08. Classifying Movie Scripts by Genre. Overview. Motivation classifying movie scripts may identify box office flops and successes before they're even produced! Data freely-available movie scripts (DailyScripts.com, etc) ‏

Download Presentation

Classifying Movie Scripts by Genre

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alex Blackstock Matt Spitz 6/9/08 Classifying Movie Scripts by Genre

  2. Overview Motivation classifying movie scripts may identify box office flops and successes before they're even produced! Data freely-available movie scripts (DailyScripts.com, etc)‏ IMDB genres (several labels/movie)‏ Tools Lucene MEMM from PA3 jBNC (naïve Bayes classifier)‏ Stanford Named Entity Recognizer Stanford Part-Of-Speech Tagger

  3. Processing Scripts

  4. Features Non-NLP dialogue shape character information NLP POS ratios Named Entity appearances Character-Based NLP analyze individual characters exclamations main vs. secondary

  5. Evaluation Metrics Example output: Blade II (gold labels: Action, Thriller, Horror)‏ guessed labels: Action, Adventure, Horror, Thriller, ... F1 Score per genre weighted-average over all genres # of guesses allowed = # of gold labels Partial Credit Score allows for some error # guesses allowed = # of gold labels * 1.5 penalized for guesses that are beyond # gold labels, but still get points

  6. Conclusions Success! best feature set: basic NLP & POS tagging PC Score: 0.601 F1 Score: 0.551 Classifier comparison (jBNC)‏ N-way classification problem 22 genres average of 3.02 genres/datum Dataset Issues consistency diversity size

More Related