slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies PowerPoint Presentation
Download Presentation
Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies

Loading in 2 Seconds...

play fullscreen
1 / 38

Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies. Friday, 06 February 2009. Towards Multimodal Evaluation of Speech Pathologies. Outline. Peaks – A system for the evaluation of pathologic speech

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies' - sherman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Elmar Nöth, Andreas Maier, Michael Stürmer, Maria SchusterTowards Multimodal Evaluation of Speech Pathologies

Friday, 06 February 2009

slide2

Towards Multimodal Evaluation of Speech Pathologies

Outline

  • Peaks – A system for the evaluation of pathologic speech
  • Examples, where multimodality is important
    • Emotional disorders  eye tracking & bio signals
    • Facial paralysis  facial expression
  • 3D information using Time-of-Flight camera
  • Real-time transmission of multimodal data
  • Outlook & summary
slide3

Towards Multimodal Evaluation of Speech Pathologies

Cleft Lip and Palate

  • Structural malformations of
    • the nose
    • the throat
    • the mouth
    • the jaw
  • Negative effects on
    • the respiration
    • the nutrition
    • the hearing
    • the speaking
    • the psychosocial competence
  • Prevalence: 1 : 500-700
slide4

Towards Multimodal Evaluation of Speech Pathologies

Laryngectomees

  • Removal of the larynx due to cancer
  • Breathing is detoured through the tracheostoma
slide5

Towards Multimodal Evaluation of Speech Pathologies

Laryngectomees

  • Removal of the larynx due to cancer
  • Breathing is detoured through the tracheostoma
  • Speaking is enabled by a substitute voice
slide6

Towards Multimodal Evaluation of Speech Pathologies

Motivation

  • Problem:
    • There is no objective validated method to measure the intelligibility reliably
    • In clinical practice: subjective evaluation only
  • Solution:
    • Application of an automatic speech recognition system (ASR) to assess the intelligibility
slide7

Towards Multimodal Evaluation of Speech Pathologies

Approach

  • Recording of the speech data:
    • Client PC with unknownoperating system
    • Different Tests for differentpatients
  • Automatic analysis of the speechdata on a server system
  • A few minutes after the recording:An automatically generated reportis available
slide8

Towards Multimodal Evaluation of Speech Pathologies

Architecture

client

server

audio-

data

recording

feature-

extraction

MFCC

secure

transmission

audio-

data

speech

analysis

speech

recognition

secure

transmission

speech

features

recognized

word-chain

report

report

scoring

slide9

Towards Multimodal Evaluation of Speech Pathologies

Subjective Evaluation

  • Evaluation of the audio data by speech experts
    • On a scale from 1 to 5
    • For each turn
    • Averaging for each speaker leads to a continuous scale from 1 to 5
slide13

Towards Multimodal Evaluation of Speech Pathologies

Outline

  • Peaks – A system for the evaluation of pathologic speech
  • Examples, where multimodality is important
    • Emotional disorders  eye tracking & bio signals
    • Facial paralysis  facial expression
  • 3D information using Time-of-Flight camera
  • Real-time transmission of multimodal data
  • Outlook & summary
towards multimodal evaluation of speech pathologies
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

towards multimodal evaluation of speech pathologies1
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

towards multimodal evaluation of speech pathologies2
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

towards multimodal evaluation of speech pathologies3
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

towards multimodal evaluation of speech pathologies4
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

towards multimodal evaluation of speech pathologies5
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

towards multimodal evaluation of speech pathologies6
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Emotional disorders

slide21

Towards Multimodal Evaluation of Speech Pathologies

Outline

  • Peaks – A system for the evaluation of pathologic speech
  • Examples, where multimodality is important
    • Emotional disorders  eye tracking & bio signals
    • Facial paralysis  facial expression
  • 3D information using Time-of-Flight camera
  • Real-time transmission of multimodal data
  • Outlook & summary
time of flight tof 3d camera
Time-of-Flight (ToF) 3D Camera
  • up to 50 Hz
  • more than 25k 3D points (176*144 pixels)
  • eye-safe infrared light / no exposure
towards multimodal evaluation of speech pathologies7
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Facial Paresis

towards multimodal evaluation of speech pathologies8
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Facial Paresis

towards multimodal evaluation of speech pathologies9
Towards Multimodal Evaluation of Speech Pathologies

Need for Multimodality: Facial Paresis

slide27

Towards Multimodal Evaluation of Speech Pathologies

Outline

  • Peaks – A system for the evaluation of pathologic speech
  • Examples, where multimodality is important
    • Emotional disorders  eye tracking & bio signals
    • Facial paralysis  facial expression
  • 3D information using Time-of-Flight camera
  • Real-time transmission of multimodal data
  • Outlook & summary
slide28

Towards Multimodal Evaluation of Speech Pathologies

Real-Time Transmission for Telemedicine

  • In many cases complex disease pattern
  • Need for specially trained therapists
  • Reduced mobility of patient

 Telemedical treatment

 Real-time transmission of multimodal data

towards multimodal evaluation of speech pathologies10
Towards Multimodal Evaluation of Speech Pathologies

Telemedicine

  • Secure transmission
  • Sufficient bandwidth
  • Video streaming with Open Source software FFmpeg (http://ffmpeg.org)
towards multimodal evaluation of speech pathologies11
Towards Multimodal Evaluation of Speech Pathologies

MPEG: YUV - Coding

Y

U

V

towards multimodal evaluation of speech pathologies12
Towards Multimodal Evaluation of Speech Pathologies

MPEG: YUV - Coding

Y

Y: 8 bit / pixel

U: 8 bit / 4 pixels

V: 8 bit / 4 pixels

YUV: 12 bit / pixel

V

U

towards multimodal evaluation of speech pathologies13
Towards Multimodal Evaluation of Speech Pathologies

Video Information to be Transmitted

  • ≈ 15 frames/second
  • currently 25.000 pixels/frame (176*144), next version: 40.000 pixels/frame (204*204)
  • Per pixel:
  • amplitude currently ignored
  • depth encoded with 8 bit and transmitted
  • in the Y channel of YUV-coding
  • XYZ coordinates ignored, can be recovered from depth & camera parameters
  • U & V channels (4 bit/pixel) transmitted but currently ignored (can be used to transmit amplitude or to improve depth resolution)
  • 15*176*144*12 bit/second + audio ≈ 0,66 MByte/second
towards multimodal evaluation of speech pathologies14
Towards Multimodal Evaluation of Speech Pathologies

Experimental Results

  • Speed: FFmpeg transmission of 3D video + audio (mp3) in real-time (15 frames, depth only, 44.1 kHz mono) < 50 kByte/second  can be done via standard DSL
  • Accuracy:
    • depends on range; here: minimum distance = 50 cm  range = maximum distance – 50  range quantizedwith 256 steps (limit to 8 Bit Y channel)
    • mpeg compression adds additional error  error measured after mpeg encoding/decoding
    • software based averaging over 5 frames
towards multimodal evaluation of speech pathologies15
Towards Multimodal Evaluation of Speech Pathologies

Experimental Results

towards multimodal evaluation of speech pathologies16
Towards Multimodal Evaluation of Speech Pathologies

Experimental Results

Original

Range: 35 cm, error: 1.6 mm

Range: 50 cm, error: 2.2 mm

Range: 90 cm, error: 3.7 mm

slide36

Towards Multimodal Evaluation of Speech Pathologies

Outlook

  • 3D image has low resolution but high resolution depth map
  • Registration of low resolution 3D with high resolution 2D  high quality videos for real-time telemedical therapy
  • Localization of eyes, mouth, etc. in 3D images is fast and less error prone than in 2D image  improved symmetry features for therapy diagnosis
  • Implementation of real-time prototypes for
    • Audio + 3D-TOF + 2D webcam
    • Audio + eye tracking + biosignals

for telemedical and biofeedback therapy

slide37

Towards Multimodal Evaluation of Speech Pathologies

Summary

  • Peaks: A system for the evaluation of pathologic speech
    • Offline, audio only, tested on different pathologies
  • Examples, where multimodality is important
    • Emotional disorders  eye tracking & bio signals
    • Facial paralysis  facial expression in 3D
  • 3D information using Time-of-Flight camera
  • Real-time transmission of multimodal data
    • Standard video streaming and information reduction  acceptable quality 3D images with standard DSL
  • Outlook: Registration of 3D with 2D image  high quality visualization, error free symmetry features
slide38

Towards Multimodal Evaluation of Speech Pathologies

Thank you for your kind attention

Supported by

Deutsche Forschungsgemeinschaft (DFG)

Deutsche Krebshilfe (German Cancer Aid)

noeth@informatik.uni-erlangen.de