1 / 55

Visible Speech Synthesis

Visible Speech Synthesis. Value of Talking Heads Enhance Intelligibility Enhance Realism and Naturalness Convey Paralanguage and Emotion State of the Art Issues Needs. Types of Synthesis. Physically Based Synthesis Terminal Analog Synthesis. Types of Synthesis.

Download Presentation

Visible Speech Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visible Speech Synthesis • Value of Talking Heads • Enhance Intelligibility • Enhance Realism and Naturalness • Convey Paralanguage and Emotion • State of the Art • Issues • Needs

  2. Types of Synthesis • Physically Based Synthesis • Terminal Analog Synthesis

  3. Types of Synthesis • Physically Based Synthesi • Articulatory Speech Synthesis • Muscle-Based Animation

  4. Physically Based Synthesis • Articulatory Models of Speech Synthesis • Auditory Speech is Goal • Eventual Payoff for Animation • Muscle-Based Simulation • Computationally Intensive • Hard Linking EMG Measures to Synthesis

  5. Physically Based Synthesis • Not Ready for Prime (Real) Time

  6. Types of Synthesis • Terminal Analog Synthesis • Control Wire-Frame Model • Concatenate Stored Images

  7. Terminal Analog Synthesis • Parameter Based Synthesis (Wire Frame) • Parke; Pearce et al • Cohen & Massaro • Le Goff & Benoit • Beskow

  8. PSLTalk (Baldi) • Real Time on SGI & PC Platforms • Extension of Parke’s Wireframe Model • Has Tongue • Evolving Hard Palate and 3D Teeth • Many Control Parameters • Rotation, Translation, Interpolation • Phoneme Synthesis • Mechanism for Coarticulation

  9. Control Parameters • Rotation of points • movement around axis, e.g., jaw rotation • Translation • movement of points, e.g., raise upper lip • Interpolation • Between two different subsections of wireframes--e.g., smile • Scaling • constant multiplier

  10. Terminal Analog Synthesis • Concatenative Synthesis (Image)

  11. Image Concatenation • Video Rewrite • Bregler, Covell, & Slaney • MikeTalk--MIT Optical Flow • Ezzat & Poggio • Triphone HMM synthesis • Brooke and Scott

  12. Coarticulation • Parameter Synthesis • Dominance Function • Other Techniques? • Image Synthesis • Segment Size • Intermediate Frames

  13. Paralinguistic (Language-Related) Synthesis • Segmental • Suprasegmental

  14. Paralinguistic Synthesis(Segmental) • Nonspeech Segments • 18 Segments in Worldbet and OGIbet • Breadth Noise, Cough, Clear Throat, Laugh, Lip Smack, Sneeze, Tongue Click, Burp, Sniff, Squeak/Voice Crack, and Sigh

  15. Paralinguistic Synthesis(Suprasegmental) • Head Movements (Referential) • Eye Movements • Eye Blinks • Eyebrow Raising with F0 • Eye Widening • Squinting

  16. Synthesis of Emotion • Voice is Informative • Face is More Critical • Basic Universal Emotions • Happiness, Anger, Surprise, Fear, Disgust, and Sadness

  17. Emotion Synthesis Issues • Control Parameters • Polygon Resolution • Interaction with Speech Parameters

  18. Text to Speech/Animation (TtSA) • Output of Text Translation • Representation of Text

  19. Requirements forOutput of Text Translation • Phonemes, duration, onset, offset • Stress • Provide Complete Sentence Transcription before Auditory Synthesis Begins

  20. Alignment of Auditory andVisible Speech • Issues? • Perception of Asynchrony and Integration • Empirical Results • Theoretical Description • Auditory Phoneme vs. Visual Phoneme • Articulatory Synthesis

  21. Representation of Paralinguistic and Emotion Information in Text • Embedded Text-Markup • SABLE--Starting Time, Intensity, and Dynamics of Emotions

  22. I was <EMO SAD=“.8”> sad, but <EMO HAP=“0.9” SAD=“0.1”> now I’m happy again </EMO>.

  23. Analysis of Visible Speech • Marked Skin Surfaces • Photogrammetric Measurement • Optotrak System • Unmarked Skin Surfaces • 3D Laser Scans of Static Poses

  24. Analysis of Internal Structures of Visible Speech • Ultrasound • X-Ray Micro-Beam • Magnetic Resonance Imaging (MRI) • Cineradiography • Electropalatography (EPG)

  25. Data Bases for Training • Model After Auditory Data Bases • Syllables, Words, Sentences • Bimodal Recording For Alignment • Bimodal TIMIT?

  26. Multiple Facial Structures • General Control Parameters • Specifying Local Control • Gradient of Movement • Simplify and/or Modify Polygon Structure

  27. Texture Mapping • Maps 2D Surface onto Wireframe • Multiple 2D Surfaces • 3D Cyberware Scan

  28. Assessment of Quality • Intelligibility • Visible Speech • Syllables, Words, Sentences • Confusion Matrices (Viseme Structure) • Combine with Bimodal Speech • Attention, Memory and Robustness • Realism

  29. Evaluating Speech Synthesis • Speechreading Syllables • Compare to Natural Speech

  30. Speechreading Syllables

  31. Evaluating Speech Synthesis • Word Recognition • Speech Reading • Confusion Matrices • Compare to Natural Speech

  32. Evaluating Speech Synthesis • Sentence Processing • Auditory Alone in Noise • Bimodal • Look at Performance Gain

  33. Evaluating Speech Synthesis • Natural Auditory Speech in Noise • Bimodal with Natural Face • versus • Synthetic Auditory Speech in Noise • Bimodal with Synthetic Face

  34. Evaluating Speech Synthesis • Always Natural Auditory Speech in Noise • Bimodal with Natural Face • versus • Bimodal with Synthetic Face

More Related