1 / 20

Multimedia search engine

Multimedia search engine. Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal , CESNET Michal Illich , Jyxo. Electronic Media. TV & radio Organized in channels Zero democracy in programming (by channel management) Centralized production (big guys business). Internet.

lee-hicks
Download Presentation

Multimedia search engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

  2. Electronic Media • TV & radio • Organized in channels • Zero democracy in programming (by channel management) • Centralized production (big guys business)

  3. Internet • Not only web (audio/video and others) • remember archie.sura.net? • IPTV / Live / Video on demand • Navigation only via web => not easy to find specific program in A/V

  4. Search options I • Voice recognition • Language identification • Accents • Video recognition • Textinterpretation (bush vs. Bush) • Low video quality

  5. Search options II • Indexing of web pages • Yahoo! does (google bomb target) Metadata • “Out of the band Metadata” (as in librarian world) • Metadata in files (added during editing or encoding)

  6. Project description • Started in 2003 (oh yes, one year before Truveo) • “Google for audio and video on Internet” • No support from content owners • Modular concept • Start with .cz Internet

  7. Technical description I • Crawler • Crawls web and collects addresses (URL) • Exports URL of multimedia files • Software written by Jyxo (Linux console app)

  8. Technical description II • Distiller • Imports addresses of multimedia files • Distills metadata (and makes XML files) • Makesscreenshots (if video in file) • C# software and mplayer (windows apps) • Runs in distributed environment

  9. Technical description III • Database • Imports XML metadata files to full text DB • Responses back-end queries for web queries • And others fulltext things (i.e. language)

  10. Crawls webpages Gets addresses Filter A/V adresses crawling distillation www. yournamehere. edu Gets metadata from multimedia files indexing search Holds fulltext database Provides back end for querries

  11. Distillation • Proces description • Get URL from DB • Get metadata from file available at URL • Get screenshots at 1,30,50 sec • Save metadata & screenshot

  12. Distillation • Use of win32 applications • Native players (WMP, RP, Qt) for metadata • Mplayer for screenshots • Takes average one minute • Slow servers/bandwidth • Streaming without fast fw

  13. DistillerGRID • <= need 16 years to distill 8.500.000 URLs • Ideal application for GRID computing • Not need of real time response • Huge amount of computing time needed • Two ways to create GRID • Build dedicated system • Use of current capacities

  14. Computing machines • PC/Windows based • HW independent • Secure environment • Security of hosting system • Security of distillation process • Well connected • Not needed to run 24x7 • Easy to manage

  15. Configuration • ~100 PCs in student labs • Running on demand during weekends • Virtual machines (MS VPC 2004) in hosting system (Win XP) • Three different HW configurations • Peak rate about 5000 URLs per minute • SQL as background -> pull distribution of work

  16. Actual status I • HW • 20crawlers • 2 servers for fulltext DB (<1.400 USD) • Distillation stations (X office PC) • Connected by 1 Gb/s to CESNET2 -> GEANT2

  17. Actual status II • Database • EU + .com, .edu • > 13.000.000 URLs • > 8.000.000 valid • > 2.800.000 with screenshots

  18. Live show?

  19. Want to test? • URLs • http://multimedia.jyxo.cz • http://videoserver.cesnet.cz/videoarchiv_en.php • For XML interface send me e-mail

  20. Questions ?Comments ? Michal Krsek, Michal.Krsek@cesnet.cz (academic service, cooperation) Michal Illich, michal@illich.cz (business service)

More Related