Spoken Language Technologies:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Agnieszka Wagner Department of Phonetics, Institute of Linguistics, PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours. Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań. Humboldt-Kolleg, Słubice 13.-15. November 2008. Introduction.

Download Presentation

Agnieszka Wagner Department of Phonetics, Institute of Linguistics,

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Agnieszka wagner department of phonetics institute of linguistics

Spoken Language Technologies:A review of application areas and research issuesAnalysis and synthesis of F0 contours

Agnieszka Wagner

Department of Phonetics, Institute of Linguistics,

Adam Mickiewicz University in Poznań

Humboldt-Kolleg, Słubice 13.-15. November 2008


Agnieszka wagner department of phonetics institute of linguistics

Introduction

The need for and increasing interest in SLT systems:

  • oral information is more efficient than a written message

  • speech is the easiest and fastest way of communication (man – man, man – machine)

Progress in the field:

  • technological advances in computer science

    • availability of specialized speech analysis and processing tools

      • collection and management of large speech corpora

      • investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristics

Spoken Language Technologies: Introduction (1)


Agnieszka wagner department of phonetics institute of linguistics

The tasks of SLT systems (TTS and ASR)

Speech synthesis (TTS, text-to-speech) systems

  • generate speech signal for a given input text

  • example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)

  • ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems

Automatic speech recognition (ASR) systems

  • provide text of the input speech signal

  • example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice)

Spoken Language Technologies: Introduction (2)


Agnieszka wagner department of phonetics institute of linguistics

Application areas

Speech synthesis

  • telecommunications (access to textual information over the telephone)

  • information retrieval

  • measurement and control systems

  • fundamental & applied research on speech and language

  • a tool of communication e.g. for the visually handicapped

Speech recognition & related technologies

  • text dictation

  • information retrieval & management

  • man machine communication (together with speech synthesis): - dialogue systems, - speech-to-speech translation, - Computer Assisted Language Learning, CALL (e.g. the AZAR tutoring system developed in the scope of the EURONOUNCE project)

Spoken Language Technologies: Application areas


Agnieszka wagner department of phonetics institute of linguistics

Performance

Generally,the output quality is high as regards generation/recognition of the linguistic propositional content of speech

Speech synthesis

  • high intelligibility and naturalness in limited domains (e.g. broadcasting news)

    Speech recognition

  • the best results for small vocabulary tasks

  • the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3%

Spoken Language Technologies: Performance of TTS and ASR systems


Agnieszka wagner department of phonetics institute of linguistics

Limitations

  • insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits

Speech synthesis

  • lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to-speech translation)

  • data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly

    Speech recognition

  • transcription of conversational and expressive speech – substantially higher word-error rate

Spoken Language Technologies: Limitations of TTS and ASR systems


Agnieszka wagner department of phonetics institute of linguistics

Progress

  • the need of modeling the non-verbal content of speech i.e. affective information

    Applications:

  • high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)

  • commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)

  • public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis)

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Progress in the field (1)


Agnieszka wagner department of phonetics institute of linguistics

  • Emotion: Anger, Fear, Elation

  • higher mean F0

  • higher F0 variability

  • higher intensity

  • increased speaking rate

  • Emotion: Sadness, Boredom

  • lower mean F0

  • lower F0 variability

  • lower intensity

  • decreased speaking rate

Progress

Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality ->

encoding and decoding of affective information

  • Intonation models:

  • hierarchical, sequential, acousitc-phonetic, phonological, etc.

  • linguistic variation – well handled

  • affective, emotional variation – unaccounted for

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Progress in the field (2)


Agnieszka wagner department of phonetics institute of linguistics

analysis

(encoding)

intonation description F0

generation

(decoding)

The comprehensive intonation model: Components

  • a module of F0 contour analysis

  • a module of F0 contour synthesis

  • description of intonation

    • discrete tonal categories (higher-level, access to the meaning of the utterance)

    • acoustic parameters (low-level)

The comprehensive intonation model: Components


Agnieszka wagner department of phonetics institute of linguistics

  • Automatic analysis of F0 contours

  • Summary

  • results comparable to inter-labeler consistency in manual annotation of intonation

  • high accuracy achieved using small vectors of acoustic features

  • statistical modeling techniques

  • application: 1) automatic labeling of speech corpora, 2) lexical & semantic content, 3) ambiguous parses, 4) estimation of F0 targets

  • Automatic synthesis of F0 contours

  • Summary

  • estimation of F0 values with a regression model

  • results comparable to those reported in the literature

  • natural (similar to the original ones) F0 contours for synthesis of a high quality and comprehensible speech (confirmed in perception tests)

The comprehensive intonation model: Analysis and Synthesis


Agnieszka wagner department of phonetics institute of linguistics

Audio (1): Mean opinion in the perception test: no audible difference

The comprehensive intonation model: Synthesis example (1)


Agnieszka wagner department of phonetics institute of linguistics

Audio (2): Mean opinion in the perception test: very good quality

The comprehensive intonation model: Synthesis example (2)


Agnieszka wagner department of phonetics institute of linguistics

Future research

Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:

  • contribution from other knowledge domains (psychology)

  • affective speech data collection

  • classification of affective states

  • types of acoustic parameters

  • measurement of affective inferences

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Future research issues


Agnieszka wagner department of phonetics institute of linguistics

THANK YOU FOR YOUR ATTENTION!


  • Login