Exemplar-based Voice Quality Analysis and Control
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Hideki Kawahara Wakayama University ATR-HIS PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on
  • Presentation posted in: General

Exemplar-based Voice Quality Analysis and Control using a High Quality Auditory Morphing Procedure based on STRAIGHT. Hideki Kawahara Wakayama University ATR-HIS. Why high quality?. Humans are very good at using voice quality in communicating non-linguistic and para-linguistic information.

Download Presentation

Hideki Kawahara Wakayama University ATR-HIS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hideki kawahara wakayama university atr his

Exemplar-based Voice Quality Analysis and Controlusing a High Quality Auditory Morphing Procedure based on STRAIGHT

Hideki Kawahara

Wakayama University

ATR-HIS


Why high quality

Why high quality?

  • Humans are very good at using voice quality in communicating non-linguistic and para-linguistic information.

    • -> We can discriminate voice quality very well.

    • -> But… only around natural speech sounds

    • -> Highly nonlinear systems need to be tested around their normal operating range.

    • -> Voice quality has to be tested using real voice.

    • -> It is crucial to provide means to control physical parameters of “real” voice in a well defined manner.

    • -> We need a very high quality analysis, modification and synthesis system.


Why exemplar based

Why exemplar based?

  • Rule based approach

    • For example….

    • How to modify formant frequencies when modifying F0 to make modified speech to sound natural?

    • Desirable but virtually impossible

    • “Curse of dimensionality”

  • Exemplar based approach

    • Finding permissible trajectories in a parametric space that span real voice examples.

    • Rule is represented as a approximating function that can generate permissible trajectories.


Why exemplar based1

Why exemplar based?

  • Rule first approach

    • For example….

    • How to modify formant frequencies when modifying F0 to make modified speech to sound natural?

    • Desirable but virtually impossible

    • “Curse of dimensionality”

  • Example first approach

    • Finding permissible trajectories in a parametric space that span real voice examples.

    • Rule is represented as a approximating function that can generate permissible trajectories.


Rule first approach example

Rule-first approach: example

original


Rule first approach example1

Rule-first approach: example

original


How to improve the rule

How to improve the rule?

  • Need to test perceptual effects for all combinations of DF1, DF2, DF3, DF4,….. N levels for each D --> PN

  • Need to check spectral tilt, harmonic to noise ratio…..

    • ----> Combinatorial explosionCurse of dimensionality


Example first approach

Example-first approach

Surprise

Happiness

Neutral

Fear

Anger

Sadness

/koNnitiwa/ (hello)


How morphing looks sounds

How morphing looks/sounds?

/hai/ (yes)


Permissible trajectory

Morphed

speech

Neutral-Anger

5

4

3

2

1

0

-0.25

0

0.25

0.5

0.75

1

1.25

Permissible trajectory

Perceived naturalness

Real

speech

Morphing rate


Permissible trajectory1

Morphed

speech

Neutral-Anger

5

4

3

2

1

0

-0.25

0

0.25

0.5

0.75

1

1.25

Permissible trajectory

Perceived naturalness

Interpolating morphing provides a permissibletrajectory under currentimplementation

Real

speech

Morphing rate


Parameters that was morphed

Parameters that was morphed

  • F0

    • Instantaneous frequency based method

  • Energy distributionon a time-frequency coordinate

    • Extended pitch synchronous analysis

  • Periodicity indexon a time-frequency coordinate

    • Hamonic to noise ration in each ERB band

  • Time-frequency coordinate

  • (Fine temporal structure)

visualization


How it work for voice quality

How it work for voice quality?

  • Morphing examples including extrapolation

    • Normal speech and shouting speech

    • Falsetto and normal speech

    • Normal speech and singing in forte


Concluding remarks

Concluding remarks

  • It is possible to use the same language based on this exemplar based approach, if we can share a common voice quality corpus like VOQUAL database.

  • It is possible to accumulate scientific and practical knowledge as a growing set of approximating functions.

    • STRAIGHT has to be improved to enable precise reproduction of varieties of voice quality. <-- This is my duty/responsibility.


Naturalness partial morphing

Naturalness: partial morphing

All

All

Co

Co

Int+F0

Co+F0

Co+F0

Int

Happiness

Sadness

Int+F0

Int

All

All: all parameters

Co: coordinate alignment only

Int: intensity only

Co+F0: coordinate and F0 were morphed

Int+F0: intensity and F0 were morphed

Co+F0

Co

Int+F0

Anger

Int


  • Login