slide1
Download
Skip this Video
Download Presentation
Agnieszka Wagner Department of Phonetics, Institute of Linguistics,

Loading in 2 Seconds...

play fullscreen
1 / 14

Agnieszka Wagner Department of Phonetics, Institute of Linguistics, - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours. Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań. Humboldt-Kolleg, Słubice 13.-15. November 2008. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Agnieszka Wagner Department of Phonetics, Institute of Linguistics, ' - soleil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Spoken Language Technologies:A review of application areas and research issuesAnalysis and synthesis of F0 contours

Agnieszka Wagner

Department of Phonetics, Institute of Linguistics,

Adam Mickiewicz University in Poznań

Humboldt-Kolleg, Słubice 13.-15. November 2008

slide2

Introduction

The need for and increasing interest in SLT systems:

  • oral information is more efficient than a written message
  • speech is the easiest and fastest way of communication (man – man, man – machine)

Progress in the field:

  • technological advances in computer science
    • availability of specialized speech analysis and processing tools
      • collection and management of large speech corpora
      • investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristics

Spoken Language Technologies: Introduction (1)

slide3

The tasks of SLT systems (TTS and ASR)

Speech synthesis (TTS, text-to-speech) systems

  • generate speech signal for a given input text
  • example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)
  • ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems

Automatic speech recognition (ASR) systems

  • provide text of the input speech signal
  • example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice)

Spoken Language Technologies: Introduction (2)

slide4

Application areas

Speech synthesis

  • telecommunications (access to textual information over the telephone)
  • information retrieval
  • measurement and control systems
  • fundamental & applied research on speech and language
  • a tool of communication e.g. for the visually handicapped

Speech recognition & related technologies

  • text dictation
  • information retrieval & management
  • man machine communication (together with speech synthesis): - dialogue systems, - speech-to-speech translation, - Computer Assisted Language Learning, CALL (e.g. the AZAR tutoring system developed in the scope of the EURONOUNCE project)

Spoken Language Technologies: Application areas

slide5

Performance

Generally,the output quality is high as regards generation/recognition of the linguistic propositional content of speech

Speech synthesis

  • high intelligibility and naturalness in limited domains (e.g. broadcasting news)

Speech recognition

  • the best results for small vocabulary tasks
  • the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3%

Spoken Language Technologies: Performance of TTS and ASR systems

slide6

Limitations

  • insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits

Speech synthesis

  • lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to-speech translation)
  • data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly

Speech recognition

  • transcription of conversational and expressive speech – substantially higher word-error rate

Spoken Language Technologies: Limitations of TTS and ASR systems

slide7

Progress

  • the need of modeling the non-verbal content of speech i.e. affective information

Applications:

  • high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)
  • commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)
  • public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis)

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Progress in the field (1)

slide8

Emotion: Anger, Fear, Elation

  • higher mean F0
  • higher F0 variability
  • higher intensity
  • increased speaking rate
  • Emotion: Sadness, Boredom
  • lower mean F0
  • lower F0 variability
  • lower intensity
  • decreased speaking rate

Progress

Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality ->

encoding and decoding of affective information

  • Intonation models:
  • hierarchical, sequential, acousitc-phonetic, phonological, etc.
  • linguistic variation – well handled
  • affective, emotional variation – unaccounted for

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Progress in the field (2)

slide9

analysis

(encoding)

intonation description F0

generation

(decoding)

The comprehensive intonation model: Components

  • a module of F0 contour analysis
  • a module of F0 contour synthesis
  • description of intonation
    • discrete tonal categories (higher-level, access to the meaning of the utterance)
    • acoustic parameters (low-level)

The comprehensive intonation model: Components

slide10

Automatic analysis of F0 contours

  • Summary
  • results comparable to inter-labeler consistency in manual annotation of intonation
  • high accuracy achieved using small vectors of acoustic features
  • statistical modeling techniques
  • application: 1) automatic labeling of speech corpora, 2) lexical & semantic content, 3) ambiguous parses, 4) estimation of F0 targets
  • Automatic synthesis of F0 contours
  • Summary
  • estimation of F0 values with a regression model
  • results comparable to those reported in the literature
  • natural (similar to the original ones) F0 contours for synthesis of a high quality and comprehensible speech (confirmed in perception tests)

The comprehensive intonation model: Analysis and Synthesis

slide11

Audio (1): Mean opinion in the perception test: no audible difference

The comprehensive intonation model: Synthesis example (1)

slide12

Audio (2): Mean opinion in the perception test: very good quality

The comprehensive intonation model: Synthesis example (2)

slide13

Future research

Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:

  • contribution from other knowledge domains (psychology)
  • affective speech data collection
  • classification of affective states
  • types of acoustic parameters
  • measurement of affective inferences

Humboldt-Kolleg, Słubice 13.-15. November 2008

Spoken Language Technologies: Future research issues

ad