Realistic speech animation on synthetic faces
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Realistic Speech Animation on Synthetic Faces PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on
  • Presentation posted in: General

Realistic Speech Animation on Synthetic Faces. Barış Uz, Uğur Güdükbay, Bülent Özgüç. Bilkent University Dept. of Computer Eng. and Information Science Bilkent 06533 Ankara Turkey. Previous Work Facial Animation Speech Animation Face Model Facial Muscles Linear Muscles Orbicularis Oris

Download Presentation

Realistic Speech Animation on Synthetic Faces

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Realistic speech animation on synthetic faces

Realistic Speech Animation on Synthetic Faces

Barış Uz, Uğur Güdükbay, Bülent Özgüç

Bilkent University

Dept. of Computer Eng. and Information Science

Bilkent 06533 Ankara Turkey


Outline

Previous Work

Facial Animation

Speech Animation

Face Model

Facial Muscles

Linear Muscles

Orbicularis Oris

Tongue Model

Speech Animation

Synchronizing Speech with Expressions

Future Work

Outline


Facial animation

Facial Animation

Keyframing [Parke’72]: Each keyframe must be completely specified. Tedious for 3D facial animation.

Parametric [Parke’82]: A set of parameters for the face is defined.

  • Expression parameters: apply to different parts; jaw rotation angle, width of the mouth, eyelid opening, eyebrow position and shape, etc.

  • Conformation parameters: apply globally to the whole face; aspect ratio of the face, skin color, etc. Since each parameter effects a disjoint set of vertices, cannot easily blend facial expressions.


Facial animation cont

Facial Animation (cont.)

Structure based [Platt’85]: the face is divided into regions based on anatomy of the face.

Physically-based [Terzopoulos and Waters’90]: the face is modeled in a layered fashion; an anatomically-based muscle model is incorporated with a physically based layered tissue model.


Speech animation

Speech Animation

Generating Speaking Face Models: model mouth and lip postures and interpolate them

  • Parametric approach [Parke75]

  • Image-based approach [Watson96]: morphing algorithm to interpolate phoneme images

  • [Waters and Frisbie’95] Coordinated 2D muscle model to model muscle interactions

  • [Basu’97] 3D model of human lips and a framework for training it from real data. Mainly for reconstruction of lip shapes from real data but can be used for lip shape synthesis


Synchronizing speech with animation

Synchronizing Speech with Animation

  • Non-automated techniques: changing the audio requires the whole synchronization process to be repeated.

    • Parke’75

    • Pearce et al. 86

  • Automatic techniques: An audio server is queried for each phoneme so that a mouth shape is computed synchronously.

    • DecFace [Waters and Levergood’93]


Generation of facial expressions

Generation of Facial Expressions

  • Layered abstractions [Kalra et al.’91] : Higher layers allow abstract manipulations; speech is synchronized with eye motion and emotions using a synchronization mechanism provided by a high-level language

  • Animated conversation [Cassell et al.’94]: animated conversation between multiple human-like agents with synchronized speech , intonation, facial expressions and hand gestures.


Face model

Face Model

  • Face model consists of 888 triangles (1700 including eyes, teeth and tongue)

  • Face is divided into three regions

    • Upper (610 triangles), lower (240 triangles) and intermediate

  • Changes to the original model

    • Repeated the mouth vertices to open and close the mouth

    • Added some polygons to close the nose

    • Added eyes and teeth to the model

    • Added a simple tongue


Regions of face

Regions of Face

Upper region

Intermediate region

Lower region


Motion muscle and vertex relationships

Motion (Muscle) and Vertex Relationships


Facial muscle vectors

Facial Muscle Vectors


Major facial muscles

Major Facial Muscles

  • Orbicularis Oris : most significant role in composing the shape of the mouth; a sphincter muscle.

  • Mentalis, buccinator, depressor anguli oris major, depressor labii inferioris : lower lip and lower face is controlled by these muscles

  • Zygomatic minor, levator labii superioris alaeque nasii : Upper face muscles; rarely used for speech; important for expressions

  • Zygomatic major, risorius : located around cheeks; important for expressions


Modeling of muscles

Types of muscles

Linear: e.g., zygomatic major (smiling)

Sphincter: e.g., orbicularis oris (mouth opener)

Sheet: e.g., orbicularis oculi major (eyelid opener)

Muscle parameters

Influence zone: between 35 and 65 degrees

Influence start: Muscle’s influence starts at this tension

Influence end: After this tension, skin resists deformation

Contraction value: muscle tension

Modeling of Muscles


Modeling of muscles cont

Modeling of Muscles (cont.)

  • P: original position

  • P’: new position

  • Rs and Rfinfluence start and finish radii

  • q : maximum zone of influence

  • D: distance of P from muscle head

  • a : angular displacement from muscle vector


Muscle deformation waters 87

Muscle deformation [Waters’87]

If P is in V1P3P4

where

k is a muscle spring constant,

a=cos(a),

if P in (V1 P3 P4)

if P in (P1 P2 P3 P4)


Orbicularis oris

Orbicularis Oris

  • Modeled as 4 linear muscles

  • Horizontal ones have 40 degrees of influence; vertical ones have 140 degrees of influence

  • Very practical to implement

  • A pseudo-muscle is added to simulate protrusion and purse effects for lower lip (f-tuck); necessary to say letters “f” and “v”


The tongue model

The Tongue Model

  • Tongue is composed of 4 sections of 20 polygons + 12 polygons to close the tip

  • Tongue is reconstructed for each changein the parameters

  • Each section has the following parameters

    • height: height of the section floor from the tonguebase

    • width: total width of the section

    • length: length of the section

    • thickness: thickness of the tongue

    • midline: height of the middle line of the tongue


The tongue model cont

The Tongue Model (cont.)

Top view

Frontal view


The tongue model cont1

The Tongue Model (cont.)

  • For example, to say the letter “l”,

    • there is no change in section 1 (the farthest section)

    • In section 2, the width will be reduced to 1/2 of the relaxed width

    • In section 3, the width will be reduced to 1/3 of the relaxed width and the height will be increased properly

    • In section 4, the width will be reduced to 1/4 of the relaxed width and the height of the tongue will be equal to the bottom of upper teeth. The midline will be equal to the thickness of the tongue in section 4.


Speech animation1

Speech Animation

  • Keyframing based on muscle parameters around the mouth and jaw rotation.

  • Each keyframe is a mouth shape dictated by the current expression setting and the current letter.

  • Cosine interpolation is used to generate inbetweens

  • The database for mouth shapes contain

    • the letter: the key field

    • the muscle contraction values: determining which muscles are active while pronouncing this letter

    • jaw rotation angle: necessary for some letters


Speech animation system

Speech Animation System


Synchronizing speech with expressions

Guessing from text

Punctuation marks

Keywords

From the meanings of words

Ambiguous! different meanings of the same word, punctuation marks and keywords

By inserting tags into text

Insert tags into text to specify expressions and their degrees explicitly

\b{expression level}: starts an expression of degree level. If the expression is set before, it is used to increase the degree of the expression.

\e{expression level}: ends or decreases the degree of an expression by level. If level is -1, the expression is removed from the face.

Synchronizing Speech with Expressions


Speech animation algorithm

Speech Animation Algorithm

While not all of the text is processed

1. Read a character

2. If a tag is beginning /* "\" is read */

2.1 Read tag /* name and degree of expression */

3. If degree is -1

3.1 Remove expression from the face

else

3.2 Set face according to expression with specified degree

4. If a valid character /* a letter or a punctuation mark */

4.1 If this is the first character to say

4.1.1 Set face using current expression and letter settings

4.1.2 Display face

else

4.2 for each in-between

4.2.1 Calculate vertex coords using cosine interpolation

4.2.2 Display face

4.3 Store vertex coords for future reference


Future work

Future Work

  • Better mouth postures

  • Implementation of coarticulation

  • Synchronization of synthetic speech with facial animation (Turkish speech synthesizer is syllable-based; we should form a database of mouth postures for 2000 Turkish syllables and group them with respect to similar mouth postures)


  • Login