Spoken Language Technologies:
Download
1 / 14

Agnieszka Wagner Department of Phonetics, Institute of Linguistics, - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours. Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań. Humboldt-Kolleg, Słubice 13.-15. November 2008. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Agnieszka Wagner Department of Phonetics, Institute of Linguistics,' - soleil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Agnieszka wagner department of phonetics institute of linguistics

Spoken Language Technologies:A review of application areas and research issuesAnalysis and synthesis of F0 contours

Agnieszka Wagner

Department of Phonetics, Institute of Linguistics,

Adam Mickiewicz University in Poznań

Humboldt-Kolleg, Słubice 13.-15. November 2008


Agnieszka wagner department of phonetics institute of linguistics

Introduction

The need for and increasing interest in SLT systems:

  • oral information is more efficient than a written message

  • speech is the easiest and fastest way of communication (man – man, man – machine)

Progress in the field:

  • technological advances in computer science

    • availability of specialized speech analysis and processing tools

      • collection and management of large speech corpora

      • investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristics

Spoken Language Technologies: Introduction (1)


Agnieszka wagner department of phonetics institute of linguistics

The tasks of SLT systems (TTS and ASR)

Speech synthesis (TTS, text-to-speech) systems

  • generate speech signal for a given input text

  • example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)

  • ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems

Automatic speech recognition (ASR) systems

  • provide text of the input speech signal

  • example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice)

Spoken Language Technologies: Introduction (2)


Agnieszka wagner department of phonetics institute of linguistics

Application areas

Speech synthesis

  • telecommunications (access to textual information over the telephone)

  • information retrieval

  • measurement and control systems

  • fundamental & applied research on speech and language

  • a tool of communication e.g. for the visually handicapped

Speech recognition & related technologies

  • text dictation

  • information retrieval & management

  • man machine communication (together with speech synthesis): - dialogue systems, - speech-to-speech translation, - Computer Assisted Language Learning, CALL (e.g. the AZAR tutoring system developed in the scope of the EURONOUNCE project)

Spoken Language Technologies: Application areas


Agnieszka wagner department of phonetics institute of linguistics

Performance

Generally,the output quality is high as regards generation/recognition of the linguistic propositional content of speech

Speech synthesis

  • high intelligibility and naturalness in limited domains (e.g. broadcasting news)

    Speech recognition

  • the best results for small vocabulary tasks

  • the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3%

Spoken Language Technologies: Performance of TTS and ASR systems


Agnieszka wagner department of phonetics institute of linguistics

Limitations

  • insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits

Speech synthesis

  • lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to-speech translation)

  • data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly

    Speech recognition

  • transcription of conversational and expressive speech – substantially higher word-error rate

Spoken Language Technologies: Limitations of TTS and ASR systems


Agnieszka wagner department of phonetics institute of linguistics

Progress

  • the need of modeling the non-verbal content of speech i.e. affective information

    Applications:

  • high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)

  • commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)

  • public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis)

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Progress in the field (1)


Agnieszka wagner department of phonetics institute of linguistics

  • Emotion: Sadness, Boredom

  • lower mean F0

  • lower F0 variability

  • lower intensity

  • decreased speaking rate

Progress

Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality ->

encoding and decoding of affective information

  • Intonation models:

  • hierarchical, sequential, acousitc-phonetic, phonological, etc.

  • linguistic variation – well handled

  • affective, emotional variation – unaccounted for

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Progress in the field (2)


Agnieszka wagner department of phonetics institute of linguistics

analysis

(encoding)

intonation description F0

generation

(decoding)

The comprehensive intonation model: Components

  • a module of F0 contour analysis

  • a module of F0 contour synthesis

  • description of intonation

    • discrete tonal categories (higher-level, access to the meaning of the utterance)

    • acoustic parameters (low-level)

The comprehensive intonation model: Components


Agnieszka wagner department of phonetics institute of linguistics

  • Automatic analysis of F0 contours

  • Summary

  • results comparable to inter-labeler consistency in manual annotation of intonation

  • high accuracy achieved using small vectors of acoustic features

  • statistical modeling techniques

  • application: 1) automatic labeling of speech corpora, 2) lexical & semantic content, 3) ambiguous parses, 4) estimation of F0 targets

  • Automatic synthesis of F0 contours

  • Summary

  • estimation of F0 values with a regression model

  • results comparable to those reported in the literature

  • natural (similar to the original ones) F0 contours for synthesis of a high quality and comprehensible speech (confirmed in perception tests)

The comprehensive intonation model: Analysis and Synthesis


Agnieszka wagner department of phonetics institute of linguistics

Audio (1): Mean opinion in the perception test: no audible difference

The comprehensive intonation model: Synthesis example (1)


Agnieszka wagner department of phonetics institute of linguistics

Audio (2): Mean opinion in the perception test: very good quality

The comprehensive intonation model: Synthesis example (2)


Agnieszka wagner department of phonetics institute of linguistics

Future research quality

Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:

  • contribution from other knowledge domains (psychology)

  • affective speech data collection

  • classification of affective states

  • types of acoustic parameters

  • measurement of affective inferences

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Future research issues