1 / 15

MOVIE QUOTES SEARCH ENGINE

MOVIE QUOTES SEARCH ENGINE. Industrial Project – Final Presentation. Students: Meytal Bialik Zvi Cahana. Technion – Israel Institute Of Technology Computer Science Department. Supervisors: Hayim Makabee Oren Somekh. MQSE. 3. 19.6.12. Introduction.

eilis
Download Presentation

MOVIE QUOTES SEARCH ENGINE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MOVIE QUOTES SEARCH ENGINE Industrial Project – Final Presentation Students: MeytalBialik ZviCahana Technion – Israel Institute Of Technology Computer Science Department Supervisors: HayimMakabee Oren Somekh MQSE 3 19.6.12

  2. Introduction The Movie Quotes Search Engine project focuses on the creation of a search engine allowing a user to search for terms that appear in the dialogues of a movie. The project consists of two main components: • A web application used as a user interface to the search engine. • A crawling engine used to maintain a searchable index and a content database. • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  3. Goals • Relevant search results • Modern UI design • Rich search options • Video play option • Browser agnostic website • Large-scale movies database • Incremental, priority-based crawling • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  4. Methodology • IMDb& OpenSubtitles.org dump files • SRT subtitle files • OpenSubtitles.org XML-RPC API • SQLite database • Apache Lucene • Java Servlets / JSP • HTML5 / CSS / JavaScript • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  5. System Diagram • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  6. Achievements • Crawling • Command-line tool • Dump files parsing • OpenSubtitles.org API based • Subtitles downloading & indexing • Cover art downloading • Multithreaded pipelined execution • Priority based • Index recovery • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  7. Achievements • Storage • SQLite-based database • Movies metadata (popularity, rating, IMDb link...) • Cover art • ~20000 subtitles downloaded & indexed • Local videos repository • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  8. Achievements • Indexing • SRT files parsing & validating • SRT files filtering • Translator comments • Hearing impaired comments • Format tags • Partitioning into overlapping search units • Indexing using Lucene core • Stemming • Stop words removal • Actual indexing of the search units • ~250ms per average SRT file • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  9. Achievements • Searching • Searching using Lucene core • Query parsing • Search operators support • Stemming • Stop words removal • Relevant buckets retrieval & ranking • Aggregating buckets to movies • Merging of overlapping buckets • Highlighting search words using Lucene core • Buckets trimming to most relevant text • Configurable weighted movie ranking • Lucene rank • Popularity • Rating • Year • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  10. Achievements • Web Application • JSP/HTML5/CSS/JavaScript based • Full support for IE9 • Modern UI design • Search results snippets • Multiple hits per movie • Paging • Video play option • Per result snippet • Relevant scene • Captions • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  11. Testing A testing platform enables comparing search results “quality” against different system configurations. • In each test, the search engine is queried with famous quotes • A test passes if relevant movie is found in the top-K results • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  12. Testing We tested the system with a set of ~100 famous movie quotes. With biased system configuration and K=9, we acquired ~90% pass rate. • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  13. Screenshots • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  14. Screenshots • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

  15. Conclusions • Luceneis a powerful search platform • Optimal search results are difficult to define • Subtitles files from public sources should be further validated • HTML5 video support is still limited & browser dependent • Source control systems make life easier • Introduction • Goals • Methodology • System Diagram • Achievements • Testing • Screenshots • Conclusions

More Related