1 / 28

Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods

Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods. Organization. Mining in Multimodal Data Streams Detecting Structure/Recurring Events Ergodic+Non-ergodic HMMs Experiments with Different Domains Concluding Remarks. Multimedia Semantics.

odina
Download Presentation

Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods Naphade Li & Huang, NGDM 02

  2. Organization • Mining in Multimodal Data Streams • Detecting Structure/Recurring Events • Ergodic+Non-ergodic HMMs • Experiments with Different Domains • Concluding Remarks Naphade Li & Huang, NGDM 02

  3. Multimedia Semantics • The Semantics of Contents • Objects, Sites and Events of Interest in the Video (ICIP 02) • The Semantics of Context • The Semantics of Structure/Recurrence • Scenes • Context Changes • Recurring Temporal Patterns • Structural Syntax Naphade Li & Huang, NGDM 02

  4. State of the ART • Content Analysis: • Image/Video Classification: Naphade (UIUC), Vailaya (Michigan State), Iyengar & Vasconcelos (MIT), Smith (IBM) • Semantic Audiovisual Analysis: Naphade (UIUC), Chang (Columbia). • Learning and Multimedia: • Statistical Media Learning: Naphade (UIUC), Forsyth (Berkeley), Fisher & Jebara (MIT), V. Iyengar (IBM). • Learning in Image Retrieval: Chang et al. (UCSB), Zhang et al (Microsoft Research), Naphade et al. (UIUC) Viola et al. (MIT, MERL). • Linking Clusters in Media Feature: Barnard & Forsyth (Berkeley), Slaney (IBM). • Vision and Speech: • Computer Vision in Media Analysis: Bolle (IBM), Mallik (Berkeley) • Auditory Scene Analysis & Discriminant ASR Models: Ellis (MIT), Nadas et al. (IBM), Gopalkrishnan et al (IBM), Woodland et al. (Cambridge), Naphade et al (UIUC) Wang et al (NYU), Kuo et al. (USC) Naphade Li & Huang, NGDM 02

  5. Media Learning: A Perspective Boosting SVM, NN, GMM, HMM-based classification, Multijects, Multinet, Supervised Segmentation, ASR, CASA Supervision Future of Multimodal Mining Query by Example, Relevance Feedback, Unsupervised Segmentation Semantics • More Supervision  More Semantics • Semi-Autonomous Learning Clever techniques for supervision that reduce amount of user input Naphade Li & Huang, NGDM 02

  6. Extracting Semantics: What Options? Signals Features Semantics Autonomous and User Friendly Today Goal Past Manual Semi-automatic Fully Automated Possible Adaptive Challenging Most accurate Most time consuming Expensive Static Future: For this to be possible and useful need Autonomous Learning. Challenge: In this realm use “intelligence” and “learning” to move from left to right without compromising on performance. Naphade Li & Huang, NGDM 02

  7. Challenges of Multimedia Learning • Challenging problems not easily addressed by traditional approaches. Naphade Li & Huang, NGDM 02

  8. Media Learning: Proposed Architecture Knowledge Repository Active Learning Active Sample Selection Multiple Instance Learning SVM, GMM, HMM Granularity Resolution Graphical Models for Decision Fusion Learning models Annotation Fusion features Audio Models Audio Features Retrieval/ Summarization Speech Models Visual Features Multimedia Repository Segmentation Feedback Visual Models Discovering Structures and Recurring Patterns Naphade Li & Huang, NGDM 02

  9. Detecting the Semantics of Structure • Examples • News: The Anchor Person • Sports e.g. Baseball: Homerun, Pitch, Strike-Out • Talk-shows: Monologue, Laughter, Applause, Music • Movies: e.g. Action Movies: Explosions, Gunshots. • Challenges • Mapping features to semantics. • Evaluating a finite set of predefined hypotheses. • Granularity: Structure exists at different granularities. • Multimodal Fusion. Naphade Li & Huang, NGDM 02

  10. Related Literature • Early Use of HMMs for capturing stationarity and transition and its application to clustering: A. B. Poritz, Levenson et al. • Scene Segmentation (using HMMs): Wolf, Ferman & Tekalp; Kender & Yeo; Liu, Huang & Wang; Sundaram and Chang, Divakaran & Chang. • Multimodal scene similarity: Nakamura & Kanade; Nam Cetin & Tewfik; Naphade, Wang & Huang; Srinivasan, Ponceleon; Amir and Petkovic; Adams et al. Naphade Li & Huang, NGDM 02

  11. Ergodic HMMs 1 2 3 1 Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering. A Possible State Sequence: Naphade Li & Huang, NGDM 02

  12. Ergodic HMMs 1 2 3 1 1 Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering. A Possible State Sequence: Naphade Li & Huang, NGDM 02

  13. Ergodic HMMs 1 2 3 1 1 2 Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering. A Possible State Sequence: Naphade Li & Huang, NGDM 02

  14. Ergodic HMMs 1 2 3 3 1 1 2 Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering. A Possible State Sequence: Naphade Li & Huang, NGDM 02

  15. Ergodic HMMs 1 2 3 3 1 1 2 1 Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering. A Possible State Sequence: Naphade Li & Huang, NGDM 02

  16. Ergodic HMMs 1 2 3 3 1 1 2 1 Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering. A Possible State Sequence: Naphade Li & Huang, NGDM 02

  17. Non Ergodic HMMs 1 2 3 1 • Transition from any state to any other state not permitted as in the Ergodic Case A Possible State Sequence: Naphade Li & Huang, NGDM 02

  18. Non Ergodic HMMs 1 2 3 1 1 • Transition from any state to any other state not permitted as in the Ergodic Case A Possible State Sequence: Naphade Li & Huang, NGDM 02

  19. Non Ergodic HMMs 1 2 3 1 1 2 • Transition from any state to any other state not permitted as in the Ergodic Case A Possible State Sequence: Naphade Li & Huang, NGDM 02

  20. Non Ergodic HMMs 1 2 3 3 1 1 2 • Transition from any state to any other state not permitted as in the Ergodic Case A Possible State Sequence: Naphade Li & Huang, NGDM 02

  21. Capturing Short Term Stationarity and Long-Term Structure 1 1 1 2 2 2 3 3 3 D D • Each branch: non-ergodic • All branches embedded in a hierarchical ergodic structure Naphade Li & Huang, NGDM 02

  22. Capturing Short Term Stationarity and Long-Term Structure 1 1 1 2 2 2 3 3 3 D D • Each branch: non-ergodic • All branches embedded in a hierarchical ergodic structure Naphade Li & Huang, NGDM 02

  23. Experimental Setup • Domains • Action Videos (20 clips from “Specialist”) • Late Night Shows (20 min of Dave Letterman) • Features • Visual (30 frames/sec) • Color (HSV histogram) • Structure (Edge Direction histogram) • Audio (30 audio frames/sec to sync with video) • 32 Mel Frequency Cepstral Coefficients (10 ms overlap) Naphade Li & Huang, NGDM 02

  24. Results: Recurring Patterns in Video Movie: Specialist Discovered Recurring Pattern: Explosion Naphade Li & Huang, NGDM 02

  25. Results: Recurring Patterns in Video Late night Show with Dave Letterman Discovered Patterns: Applause, Laughter, Speech, Music Applause Laughter Naphade Li & Huang, NGDM 02

  26. Observations • Completely UNSUPERVISED • In case of recurring temporal event patterns this scheme is capable of discovering them if there is a sufficient number of these patterns in the set. • In case of repetitive anchoring events such as Applause in Comedy Shows, scheme capable of discovering these events. • Segmentation and Pattern Discovery very helpful in annotation. E.g. to manually annotate Dave Letterman’s jokes, just look before the applause • Anyone who has done manual audio annotation knows how useful it is to get the right segment boundaries especially at the micro and macro level. Naphade Li & Huang, NGDM 02

  27. Summary • Problem: • Automatic discovery of recurring temporal patterns without supervision. • Approach: • Clustering: Use of unsupervised temporal clustering using a hierarchical ergodic model with non-ergodic temporal pattern models • Interaction: User then needs to analyze only the extracted recurring set to quickly propagate annotation. • Results: • Automatic extraction of recurring patterns (laughter, explosion, monologue etc.) and regular structure • Near-complete elimination of manual annotation. Orders of magnitude reduction in annotation of clusters than annotation of content. Naphade Li & Huang, NGDM 02

  28. Experiment with different non-ergodic branches as well as across branch transitions Use this to bootstrap training of semantic events that can be detected using HMMs/DBNs (ICIP 98, NIPS 2000). Explore visual features extracted regionally to model richer class of recurring patterns. Experimenting with the Sports Domain (possible interaction with Prof. Chang and his group) Future Directions Naphade Li & Huang, NGDM 02

More Related