Clinical applications of speech technology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Clinical Applications of Speech Technology PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Clinical Applications of Speech Technology. Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield [email protected] Talk Overview. SPandH - Speech and Hearing @ Sheffield The CAST group

Download Presentation

Clinical Applications of Speech Technology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Clinical applications of speech technology

Clinical Applications of Speech Technology

Phil Green

Speech and Hearing Research Group

Dept of Computer Science

University of Sheffield

[email protected]


Talk overview

Talk Overview

  • SPandH - Speech and Hearing @ Sheffield

  • The CAST group

  • Building Automatic Speech Recognisers – conventional methodology

  • ASR for clients with speech disorders

  • Kinematic Maps

  • Voice-driven Environmental Control

  • VIVOCA

  • Customising Voices

  • Future Directions

CAST December 2007


Spandh

Auditory Scene Analysis

Glimpsing

Missing Data Theory

CAST

Phonetics &

Linguistics

Hearing &

Acoustics

SPandH

Electrical Engineering &

Signal Processing

Speech & Language

Therapy

CAST December 2007


Prof mark hawley school of health and related research assistive technology

Prof Pam Enderby

Institute of General Practice and Primary Care

University of Sheffield

Speech Therapy

Prof Phil Green

Prof Roger K Moore

Speech and Hearing Research Group

Department of Computer Science

University of Sheffield

Speech Technology

Dr Stuart Cunningham

Department of Human Communication Sciences

University of Sheffield

Speech Perception, Speech Technology

Prof Mark Hawley

School of Health and Related Research

Assistive Technology

Contact: [email protected]


Conventional automatic speech recogniser construction

Each speech unit is modeled by an HMM with a number of states.

Standard technique uses generative statistical models:

Conventional Automatic Speech Recogniser Construction

Each state is characterised by a mixture Gaussian distribution

over the components of the acoustic vector x.

Parameters of the distributions estimated in training (EM – Baum-Welch)

Training based on large pre-recorded speaker-independent speech corpus

All this is the acoustic model. There will also be a language model.

Decoding finds model & state sequence most likely to generate X .

CAST December 2007


Dysarthria

Dysarthria

  • Loss of control of speech articulators

  • Stroke victims, cerebral palsy, MS..

  • Effects 170 per 100,000 population

  • Severe cases unintelligible to strangers:

  • Often accompanied by physical disability

channel

lamp

radio

CAST December 2007


Stardust asr for dysarthric speakers

STARDUST: ASR for Dysarthric Speakers

  • NHS NEAT Funding

  • Environmental control

  • Small vocabulary, isolated words

  • Speaker-dependent

  • Sparse training data

  • Variable training data

CAST December 2007


Stardust methodology

Train Recogniser

New Recordings

ConfusabilityAnalysis

ClientPractice

For Consistency

STARDUST Methodology

Initialrecordings

CAST December 2007


Stardust training results

STARDUST training results

ECS trial: halved the average time to execute a command

CAST December 2007


Stardust consistency training

STARDUST Consistency Training

CAST December 2007


Stardust clinical trial

STARDUST Clinical Trial

CAST December 2007


Optacia kinematic maps

OPTACIA: Kinematic Maps

s

ANN

Mapping

Signal

Processing

  • Pronunciation Training Aid

  • EC Funding

  • Speech acoustics mapped to x,y position in map window in real time

  • Mapping by trained Neural Net

  • Customise for exercises and clients

i

sh

Speech

CAST December 2007


Example vowel map

Example: Vowel Map

CAST December 2007


Specs speech driven environmental control systems

SPECS: Speech-Driven Environmental Control Systems

  • NHS HTD Funding

  • Industrial exploitation

  • STARDUST on ‘balloon board’

CAST December 2007


Vivoca voice input voice output communication aid

Dysarthric

speech

ASR

Speech Synthesis

TextGeneration

Intelligible speech

VIVOCA- Voice Input Voice Output Communication Aid

  • NHS NEAT funding

  • Assists communication with strangers;

    Client: ‘buy tea’ [unintelligible]

    VIVOCA: ‘A cup of tea with milk and no sugar please’ [intelligible synthesised speech]

  • Runs on a PDA

CAST December 2007


Voices for vivoca

Voices for VIVOCA

  • It is possible to build voices from training data

  • A local voice is preferable

  • Yorkshire voices:

    • Ian MacMillan

    • Christa Ackroyd

CAST December 2007


Concatenative synthesis

Concatenative synthesis

Festvox: http://festvox.org/

Speech recordings

Unit

segmentation

Input data

i

a

sh

Unit database

Text input

Synthesised speech

Unit

selection

Concatenation

+ smoothing

… +

+

+ …

CAST December 2007


Concatenative synthesis1

Concatenative synthesis

  • High quality

  • Natural sounding

  • Sounds like original speaker

  • Need a lot of data (~600 sentences)

  • Can be inconsistent

  • Difficult to manipulate prosody

CAST December 2007


Hmm synthesis

HMM synthesis

y

e

s

yes

CAST December 2007


Hmm synthesis adaptation

HMM synthesis: adaptation

HTS http://hts.sp.nitech.ac.jp/

Speech

recordings

Speech recordings

Input data

e

Training

Adaptation

t

Average speaker model

Synthesised

speech

Text input

100

e

Synthesis

200

t

Adapted speaker model

CAST December 2007


Hmm synthesis1

HMM synthesis

  • Consistent

  • Intelligible

  • Easier to manipulate prosody

  • Needs relatively little input for adaptation data (>5 sentences)

  • Less natural than concatenative

CAST December 2007


Personalisation for individuals with progressive speech disorders

Personalisation for individuals with progressive speech disorders

  • Voice banking

    • Before deterioration

  • Capturing the essence of a voice

    • During deterioration

CAST December 2007


Hmm synthesis adaptation for dysarthric speech

HMM synthesis: adaptation for dysarthric speech

HTS http://hts.sp.nitech.ac.jp/

Speech

recordings

Speech recordings

Input data

e

Training

Adaptation

t

Average speaker model

Duration, phonation

and energy information

Synthesised

speech

Text input

e

Synthesis

t

Adapted speaker model

CAST December 2007


Helping people who have lost their voice

Helping people who have lost their voice

  • Operations like laryngectomy remove the vocal cords completely

  • If recordings have been made before the operation a synthetic voice can be reconstructed

  • E.g (HMM synthesis, 7 minutes of poor quality adaptation data)

Original

Synthesised

CAST December 2007


Redress with university of hull

REDRESS (with University of Hull)

  • Small magnets placed on lips & tongue

  • Changes in magnetic field detected

  • This data can be used as the basis for speech recognition

  • Demonstrated accurate results on 50 word vocabulary

CAST December 2007


Future directions

Future directions

  • Personal Adaptive Listeners (PALS)

  • ‘Home Service’

  • Companions

CAST December 2007


The pals concept

The PALS Concept

A PAL is a portable (PDA, wearable..) device which you own

Your PAL is like your valet

  • It knows a lot about you..

    • The way you speak, the words you like to use

    • Your interests, contacts, networks

  • You talk with it

    • The knowledge makes conversational dialogues viable

  • It does things for you

    • Bookings, appointments, reminders

    • Communication

    • Access to services..

  • It learns to do a better job

    • By Automatic Adaptation: acoustic models, language models, dialogue models

    • By explicit training (this is how I refer to things, these are the names I use..) USER-AS-TEACHER

CAST December 2007


  • Login