1 / 23

Star Challenge – multimedia search competition 2008

Star Challenge – multimedia search competition 2008. NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008. Agenda. About StarChallenge Approaches Audio system Video system Results. Let’s start with a clip on Tai Chi!. The Star Challenge.

werner
Download Presentation

Star Challenge – multimedia search competition 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008

  2. Agenda • About StarChallenge • Approaches • Audio system • Video system • Results

  3. Let’s start with a clip on Tai Chi!

  4. The Star Challenge • International Competition organized by Singapore A*STAR • Focus on Multimedia Search by Voice and Video • Prize: • Free Trip to Singapore (blah!) • USD 100,000 (!!!)

  5. The Tasks • Voice Search • AT1: Search by IPA (International Phonetic Alphabet) • AT2: Search by Example • AT3: Search for recurrent voice segments • Video Search • VT1: Search by (single) Query Image • VT2: Search by Video Shot • VT3: Scene/Event Categorization AT3 and VT3 replaced by integrated search in the end

  6. Timeline • Mar 31: Registration Deadline • Registered as adMIRer • 5 members from NUS-SIGIR • 56 teams registered in total • June 18: 1st Knockout Round • AT1+AT2 • 8 Teams qualified

  7. Timeline • July 18: 2nd Knockout Round • VT1+VT2 • 7 Teams qualified • September 4: Qualifying Race • All four tasks with Integrated Search • Only 5 Teams would qualify • October 23: Grand Final • On-site evaluation

  8. Audio system – general approach • Use MFCC - well reflects speech • Use local alignment to align 2 sequences of audio & query • Using spectrogram, we cut up long audio into small segments for better matching.  Short demo

  9. Audio system – system overview Query audio files Test audio files Audio feature extractor Speech recognizer Test text Query MFCC vectors Test MFCC vectors Lucene indexing Query text Index data Query-test similarity matrix Alignment & matching Lucene matching Heuristic fusion Results

  10. Audio system – Handle IPA • " i n t r ^ s t r ei t”: IPA query • Translate to CMU phonemes: IH N T R AH S T R EY T • INTEREST: IH N T R AH S T • RATE: R EY T • Query text: input to text module directly synthezied to audio file for audio module

  11. Audio system – overall performance • Not have complete statistics yet, but AT2 (query by example) ~ 30-40% MAP, AT1 ~ 10 % • Let’s listen to a few queries …

  12. Video system – VT1 categories • 11. Swimming pool, sports • 12. Closeup of hand, e.g. using mouse, writing, etc • 13. Business meeting (> 2 people), mostly seated down, table visible • 14. Natural scene, e.g. mountain, trees, sea, no pple • 15. Food on dishes, plates • 16. Face closeup, occupying about 3/4 of screen, frontal or side • 17. Traffic Scene, many cars, trucks, road visible • 18. Boat/Ship, over sea, lake • 19. PC Webpages, screen of PC visible • 120. Airplane • 1. Crowd (>10 people) • 2. Building with sky as backdrop, clearly visible • 3. Mobile devices including handphone/PDA • 4. Flag • 5. Electronic chart, e.g. stock charts, airport departure chart • 6. TV chart Overlay, including graphs, text, powerpoint style • 7. Person using Computer, both visible • 8. Track and field, sports • 9. Company Trademark, including billboard, logo • 10. Badminton court,

  13. Video system - examples 16. Face closeup 9. Company trademark 2. Building with sky backdrop 3. Mobile devices

  14. Video system – VT2 categories • 1. People entering/exiting door/car • 2. Talking face with introductory caption • 3. Fingers typing on a keyboard • 4. Inside a moving vehicle, looking outside • 5. Large camera movement, tracking an object, person, car, etc • 6. Static or minute camera movement, people(s) walking, legs visible • 7. Large camera movement, panning left/right, top/down of a scene • 8. Movie ending credit • 9. Woman monologue • 10. Sports celebratory hug

  15. Video system – general approach Test files classifiers Classified cateogry Category filtering Query category Filtered test files Matching Query file Matched test files

  16. Video system - Training data size Development data statistics • Dev = 10% labelled data, Train = 90% labelled data • Size varies significantly across different categories

  17. Video system – classifier training Train key frames + categories Color extractor Edge extractor Face detector Layout extractor Color classifier Edge classifier Face classifier Layout classifier Color histogram (HSV, RGB) Edge histogram Num faces, size, positions Segmentation info Multi-class SVM training Dev key frames Color recall /categories Edge recall /categories Facerecall /categories Layout recall /categories Uses as weights

  18. Classifer recall/categories • Uses as weights when fusing all different classifier • No miror analysis & n-fold testing yet

  19. Video system – Category filtering & Matching Test video Test Key frames Motion extractor Color extractor Edge extractor Face detector Layout extractor motion histogram; camera & object motion Color histogram (HSV, RGB) Edge histogram Num faces, size, positions Segmentation info Color classifier Edge classifier Face classifier Layout classifier Classifier merger (weights from dev data) Heuristic category filtering Category filtering Query video/frames Query category Matching Filtered video Filtered key frames Results

  20. Video system – motion 1 Camera: panning left Camera: panning up Object motion: static Object motion: moving

  21. Video system – motion 2 • Check if most vector ~ 0  static motion • Otherwise, filter all small motion vectors • Categories motion vectors into circle bins •  histogram. + main vector motion • If main vector motion dominates  camera motion  panning left, right, up, down • To detect zooming, find a focus block/point • Object motion is derived after removing camera motion

  22. Conclusion • We have built up a full-function system within a short time and in an ad-hoc manner • There are plenty of place for performance improvement and detailed analysis.

  23. Q & A? • Thank you !!!

More Related