1 / 10

ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10

ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10. Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr /projects/muscle. Researchers: P. Maragos, S. Kollias (Faculty members)

caelan
Download Presentation

ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr/projects/muscle

  2. Researchers: P. Maragos, S. Kollias (Faculty members) G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis, I. Kokkinos (PhD GRA) G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition Face Detection, Modeling & Tracking AV Feature Extraction, Fusion, Dynamic Models for AV-ASR AV to Articulatory Speech Inversion (WP6) E-team 2: Audio-Visual Understanding Audio-Visual Salient Event Detection, Integrated Multimedia Content Analysis ICCS-NTUA: E-team Researchers & Directions WP6 E-teams: 8-12-2005

  3. AV-ASR Front-End Feature Transform./ Selection Speech • Modulations – Energy • Multiband Filtering • Nonlinear Processing • Demodulation M-Array Processing Fusion • Dynamics - Fractals • Embedding • Geometrical Filtering • Fractal Dimensions Feature Stream • Visual • Active Appearance Model • Face Detection/Tracking • Mouth R.O.I. Features MFCC VAD Speaker Normalization WP6 E-teams: 8-12-2005

  4. Audiovisual ASR: Face Modeling • A well studied problem in Computer Vision: • Active Appearance Models, Morphable Models, Active Blobs • Both Shape & Appearance can enhance lipreading • The shape and appearance of human faces “live” in low dimensional manifolds = = WP6 E-teams: 8-12-2005

  5. Image Fitting Example step 2 step 6 step 10 step 14 step 18 WP6 E-teams: 8-12-2005

  6. Example: Face Interpretation Using AAM shape track superimposed on original video reconstructed face This is what the visual-only speech recognizer “sees”! original video • Generative models like AAM allow us to evaluate the output of the visual front-end WP6 E-teams: 8-12-2005

  7. Joint Image Segmentation and Object Detection via the  Expectation Maximization algorithm • Generative models ‘compete’ for image observations • Segmentation translates into the assignment of image observations into one of K models (image labelling) • Segmentation labels are treated like hidden data • EM algorithm: • Ε-step: use current parameter estimates to assign micro-segments to objects • M-step use assignment probabilities to derive optimal model parameters • Active Appearance Models used as generative • models for the object categories of cars and faces WP6 E-teams: 8-12-2005

  8. Top-Down Segmentation Results • Thresholding the E-step we get a hard figure-ground segmentation • No ‘shape-prior’ knowledge is necessary for the segmentation • generative model contains information about shape variation • Combination of bottom-up & top-down detection On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object. WP6 E-teams: 8-12-2005

  9. Spatio-Temporal Visual Attention I: Video Analysis • Create video volume • Feature extraction from spatiotemporal data • Fusion & saliency generation

  10. Spatio-Temporal Visual Attention II:Classification & segmentation • Use spatiotemporal VA for efficient global classification of videos • Claim: features extracted only from low or high saliency regions are more representative of the input video • Foreground/Background segmentation • Claim: most salient regions are related to foreground areas of the video WP6 E-teams: 8-12-2005

More Related