1 / 24

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents. Elisabeth Chorianopoulou MSc in Speech and Language Processing University of Edinburgh Supervisor : Prof. D. R. Ladd External Advisor : Robert Clark (CSTR). Today’s presentation.

cameo
Download Presentation

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing University of Edinburgh Supervisor: Prof. D. R. Ladd External Advisor: Robert Clark (CSTR)

  2. Today’s presentation • Project’s main goal • Theoretical background • Hypothesis • Tools & Methods • Pilot experiment • design • results • Future work

  3. Prosody prediction in modern TTS systems • Abstract Level AcousticsPerception f0pitch duration rhythm amplitude loudness • Interaction of correlates not always clear… • Not necessarily adequate information from text • Speaker variability (production & perception)

  4. F0prediction • Global f0 properties: declination, reset. • Local f0properties: contour shape, tonal targets, alignment. • F0 predictors: syllable properties word properties rhythm syntactic structure information structure

  5. Project’s Main Goal • Intonational phonetics & phonology  prosody prediction in synthesis • Synthetic speech: insight on role of tonal alignment • Naturalness judgements • effect • distribution • TTS system design?

  6. Pre-nuclear accents • Prosodic units: IP (intonational phrase) iP (intermediate phrase) • iP contains one or more pitch accents • Final accent in iP is the nuclear accent • All non final accents are pre-nuclear

  7. The case of Modern Greek (Arvaniti et al., 1998) • Tonal targets: scaling & alignment • Modern Greek pre-nuclear accents: two tonal targets, a L and a H. • Stability of valley (F0min) vs variability of peak (F0 max)  type of accent? • bitonal L* + H • L* accent followed by H phrase tone

  8. The case of Modern Greek (Arvaniti et al., 1998) H L C0 V0* C1 V1 • Tonal targets independently aligned with specific points in segmental string. • Duration & slope off0movement depends on segmental quality. (-5ms) (+15ms)

  9. What does the project actually involve? • Presuppose validity of Arvaniti et al.’s findings • Apply them in synthetic speech (DEMOSTHeNES Speech Composer) • Move alignment points of both L and H (Praat) • Perceptual experiments (E-Prime)

  10. Original hypothesis • Movements in alignment are not going to influence perception of naturalness significantly. • In case perception is affected, late alignment of the F0 max is expected to have the greatest influence.

  11. Test Sentences • At least one unaccented syllable preceding accented one • Accented vowel between nasals, lateral • At least two syllables before following accent • Example Sentence Τοανώνυμογράμματηναναστάτωσε. To ano*nimo gra*ma tin anasta*tose

  12. DEMOSTHeNES • University of Athens, M-PIRO project • a modular system like Edinburgh’s Festival (HRG, VSERVER, VCOM, VMOD) • Prosody in DEMOSTHeNES • duration, pitch, amplitude offered as VCOMs linked to the HRG • Current prosodic model: phrasing & lexical stress

  13. Output (Praat) • f0declination • reset at phrase breaks • limited pitch range • limited movements

  14. Towards naturalness I • Apply results of Arvaniti et al. to default pitch contour of DEMOSTHeNES. H L C0 V0* C1 V1 • Not only first but also second stressed syllable (+15ms) (-5ms)

  15. Output (Praat) • f0 declination • same pitch range • more f0 movements

  16. Towards naturalness II : modifications in alignment • Targets moved independently earlier or later than normal alignment points • Early – Late • Late – Early • Normal – Late etc… • 40 – 80 ms 50 – 100 ms 60 – 120 ms ?

  17. Output Early L (50ms) Late H (100ms)

  18. Output Late L (50ms) Early H (100ms)

  19. Design of pilot perceptual experiment • 2 sentences: standardVSmodified alignment N – N VS Early – Late Late – Early Normal - Late • Naturalness judgement of pair-comparisons • 12 native Greek speakers, students in Edinburgh • Aim: 40 – 80 50 – 100 60 - 120 ?

  20. Results I

  21. Results II

  22. Future Work • 10 sentences: standardVSmodified alignment N – N all possible combinations between Early – Normal – Late • Modifications by 40 – 80 and 60 – 120 ms • Native Greek speakers, Greece, July :-) • Aim: patterns in perception of naturalness?

  23. The contribution of this project • Insight on role of alignment in perceiving a synthetic utterance as natural • TTS system design • results not restricted to Greek • evidence for segmental anchoring in other languages – studies of Dutch, German, English

  24. Sound files DEMOSTHeNES Arvaniti et al. Early L (50ms)– Late H (100ms) Late L (50ms)– Early H (100ms)

More Related