slide1
Download
Skip this Video
Download Presentation
Hideki Kawahara Wakayama University ATR-HIS

Loading in 2 Seconds...

play fullscreen
1 / 15

Hideki Kawahara Wakayama University ATR-HIS - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

Exemplar-based Voice Quality Analysis and Control using a High Quality Auditory Morphing Procedure based on STRAIGHT. Hideki Kawahara Wakayama University ATR-HIS. Why high quality?. Humans are very good at using voice quality in communicating non-linguistic and para-linguistic information.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Hideki Kawahara Wakayama University ATR-HIS' - chaim


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Exemplar-based Voice Quality Analysis and Controlusing a High Quality Auditory Morphing Procedure based on STRAIGHT

Hideki Kawahara

Wakayama University

ATR-HIS

why high quality
Why high quality?
  • Humans are very good at using voice quality in communicating non-linguistic and para-linguistic information.
    • -> We can discriminate voice quality very well.
    • -> But… only around natural speech sounds
    • -> Highly nonlinear systems need to be tested around their normal operating range.
    • -> Voice quality has to be tested using real voice.
    • -> It is crucial to provide means to control physical parameters of “real” voice in a well defined manner.
    • -> We need a very high quality analysis, modification and synthesis system.
why exemplar based
Why exemplar based?
  • Rule based approach
    • For example….
    • How to modify formant frequencies when modifying F0 to make modified speech to sound natural?
    • Desirable but virtually impossible
    • “Curse of dimensionality”
  • Exemplar based approach
    • Finding permissible trajectories in a parametric space that span real voice examples.
    • Rule is represented as a approximating function that can generate permissible trajectories.
why exemplar based1
Why exemplar based?
  • Rule first approach
    • For example….
    • How to modify formant frequencies when modifying F0 to make modified speech to sound natural?
    • Desirable but virtually impossible
    • “Curse of dimensionality”
  • Example first approach
    • Finding permissible trajectories in a parametric space that span real voice examples.
    • Rule is represented as a approximating function that can generate permissible trajectories.
how to improve the rule
How to improve the rule?
  • Need to test perceptual effects for all combinations of DF1, DF2, DF3, DF4,….. N levels for each D --> PN
  • Need to check spectral tilt, harmonic to noise ratio…..
    • ----> Combinatorial explosionCurse of dimensionality
example first approach
Example-first approach

Surprise

Happiness

Neutral

Fear

Anger

Sadness

/koNnitiwa/ (hello)

permissible trajectory

Morphed

speech

Neutral-Anger

5

4

3

2

1

0

-0.25

0

0.25

0.5

0.75

1

1.25

Permissible trajectory

Perceived naturalness

Real

speech

Morphing rate

permissible trajectory1

Morphed

speech

Neutral-Anger

5

4

3

2

1

0

-0.25

0

0.25

0.5

0.75

1

1.25

Permissible trajectory

Perceived naturalness

Interpolating morphing provides a permissibletrajectory under currentimplementation

Real

speech

Morphing rate

parameters that was morphed
Parameters that was morphed
  • F0
    • Instantaneous frequency based method
  • Energy distributionon a time-frequency coordinate
    • Extended pitch synchronous analysis
  • Periodicity indexon a time-frequency coordinate
    • Hamonic to noise ration in each ERB band
  • Time-frequency coordinate
  • (Fine temporal structure)

visualization

how it work for voice quality
How it work for voice quality?
  • Morphing examples including extrapolation
    • Normal speech and shouting speech
    • Falsetto and normal speech
    • Normal speech and singing in forte
concluding remarks
Concluding remarks
  • It is possible to use the same language based on this exemplar based approach, if we can share a common voice quality corpus like VOQUAL database.
  • It is possible to accumulate scientific and practical knowledge as a growing set of approximating functions.
    • STRAIGHT has to be improved to enable precise reproduction of varieties of voice quality. <-- This is my duty/responsibility.
naturalness partial morphing
Naturalness: partial morphing

All

All

Co

Co

Int+F0

Co+F0

Co+F0

Int

Happiness

Sadness

Int+F0

Int

All

All: all parameters

Co: coordinate alignment only

Int: intensity only

Co+F0: coordinate and F0 were morphed

Int+F0: intensity and F0 were morphed

Co+F0

Co

Int+F0

Anger

Int

ad