The cued speech group
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

The CUED Speech Group PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department. The CUED Speech Group. Signal Processing Lab. Computational and Biological Learning Lab. Machine Intelligence Lab. Control Lab. 4 Staff Bill Byrne Mark Gales Phil Woodland

Download Presentation

The CUED Speech Group

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The cued speech group

Dr Mark Gales

Machine Intelligence Laboratory

Cambridge University Engineering Department

The CUED Speech Group


The cued speech group

Signal Processing Lab

Computational and Biological Learning Lab

Machine

Intelligence Lab

Control

Lab

4 Staff

Bill Byrne

Mark Gales

Phil Woodland

Steve Young

9 RA’s

12 PhD’s

Medical

Imaging

Group

Vision

Group

Speech

Group

2

1. CUED Organisation

130

1100

450

Academic Staff

Undergrads

Postgrads

CUED: 6 Divisions

A. ThermoFluids

B. Electrical Eng

C. Mechanics

D. Structures

E. Management

F. Information

Engineering

Division


2 speech group overview

Primary research interests in speech processing

4 members of Academic Staff

9 Research Assistants/Associates

12 PhD students

Funded Projects in Recognition/Translation/Synthesis

(5-10 RAs)

MPhil in

Computer

Speech,

Text and

Internet

Technology

Computer

Laboratory

NLIP

Group

PhD Projects in Fundamental Speech Technology Development (10-15 students)

Computer Speech and

Language

HTK Software ToolsDevelopment

International Community

2. Speech Group Overview

3


Principal staff and research interests

Principal Staff and Research Interests

  • Dr Bill Byrne

    • Statistical machine translation

    • Automatic speech recognition

    • Cross-lingual adaptation and synthesis

  • Dr Mark Gales

    • Large vocabulary speech recognition

    • Speaker and environment adaptation

    • Kernel methods for speech processing

  • Professor Phil Woodland

    • Large vocabulary speech recognition/meta-data extraction

    • Information retrieval from audio

    • ASR and SMT integration

  • Professor Steve Young

    • Statistical dialogue modelling

    • Voice conversion

4


Research interests

  • data driven techniques

  • voice transformation

  • HMM-based techniques

Synthesis

  • statistical machine translation

  • finite state transducer framework

Translation

  • large vocabulary systems [Eng, Chinese, Arabic ]

  • acoustic model training and adaptation

  • language model training and adaptation

  • rich text transcription & spoken document retrieval

Recognition

  • fundamental theory of statistical modelling and pattern processing

Machine Learning

Research Interests

  • data driven semantic processing

  • statistical modelling

Dialogue

5


Example current and recent projects

Global Autonomous Language Exploitation

DARPA GALE funded (collab with BBN, LIMSI, ISI …)

HTK Rich Audio Trancription Project (finished 2004)

DARPA EARS funded

CLASSIC: Computational Learning in Adaptive Systems for Spoken Conversation

EU (collab with Edinburgh, France Telecom,,…)

EMIME: Effective Multilingual Interaction in Mobile Environments

EU (collab with Edinburgh, IDIAP, Nagoya Institute of Technology … )

R2EAP: Rapid and Reliable Environment Aware Processing

TREL funded

Example Current and Recent Projects

Also active collaborations with IBM, Google, Microsoft, …

6


3 rich audio transcription project

DARPA-funded project

Effective Affordable Reusable Speech-to-text (EARS) program

Transform natural speech into human readable form

Need to add meta-data to the ASR output

For example speaker-terms/handle disfluencies

New algorithms

http://mi.eng.cam.ac.uk/research/projects/EARS/index.html

See

3. Rich Audio Transcription Project

RichTranscript

Natural Speech

English/Mandarin

7


Rich text transcription

Rich Text Transcription

ASR Output

okay carl uh do you exercise yeah actually um i belong to a gym down here

gold’s gym and uh i try to exercise five days a week um and now and then

i’ll i’ll get it interrupted by work or just full of crazy hours you know

Meta-Data Extraction (MDE) Markup

Speaker1:/ okay carl {F uh} do you exercise /

Speaker2:/ {DM yeah actually} {F um} i belong to a gym down here /

/ gold’s gym / / and {F uh} i try to exercise five days a week {F um} /

/ and now and then [REP i’ll + i’ll] get it interrupted by work or just

full of crazy hours {DM you know } /

Final Text

Speaker1:Okay Carl do you exercise?

Speaker2: I belong to a gym down here, Gold’s Gym, and I try to

exercise five days a week and now and then I’ll get it

interrupted by work or just full of crazy hours.

8


4 statistical machine translation

4. Statistical Machine Translation

  • Aim is to translate from one language to another

    • For example translate text from Chinese to English

  • Process involves collecting parallel (bitext) corpora

    • Align at document/sentence/word level

  • Use statistical approaches to obtain most probable translation

9


Gale integrated asr and smt

http://mi.eng.cam.ac.uk/research/projects/AGILE/index.html

See

GALE: Integrated ASR and SMT

  • Member of the AGILE team (lead by BBN)

    The DARPA Global Autonomous Language Exploitation (GALE) program has the aim of developing speech and language processing technologies to recognise, analyse, and translate speech and text into readable English.

  • Primary languages for STT/SMT: Chinese and Arabic

10


5 statistical dialogue modelling

Use a statistical framework for all stages

Speech

Understanding

System

Dialogue

Manager

Speech

Generation

Waveforms

Words/Concepts

Dialogue Acts

5. Statistical Dialogue Modelling

11


The cued speech group

Speech output

1-Best Signal Selection

Speech Input

x

DM

x

ASR

NLU

NLG

TTS

x

x

x

x

rt

ut

wt

ht

at

st

Context t-1

http://classic-project.org

See

CLASSiC: Project Architecture

Legend:

ASR: Automatic Speech recognition

NLU: Natural Language Understanding

DM: Dialogue Management

NLG: Natural Language Generation

TTS: Text To Speech

st: Input Sound Signal

ut: Utterance Hypotheses

ht: Conceptual Interpretation Hypotheses

at: Action Hypotheses

wt: Word String Hypotheses

rt: Speech Synthesis Hypotheses

X: possible elimination of hypotheses


6 emime speech to speech translation

http://emime.org

See

6. EMIME: Speech-to-Speech Translation

  • Personalised speech-to-speech translation

    • Learn characteristics of a users speech

    • Reproduce users speech in synthesis

  • Cross-lingual capability

    • Map speaker characteristics across languages

  • Unified approach for recognition and synthesis

    • Common statistical model; hidden Markov models

    • Simplifies adaptation (common to both synthesis and recognition)

  • Improve understanding of recognition/synthesis

13


7 r 2 eap robust speech recognition

7. R2EAP: Robust Speech Recognition

  • Current ASR performance degrades with changing noise

    • Major limitation on deploying speech recognition systems

14


Project overview

Aims of the project

To develop techniques that allow ASR system to rapidly respond to changing acoustic conditions;

While maintaining high levels of recognition accuracy over a wide range of conditions;

And be flexible so they are applicable to a wide range of tasks and computational requirements.

Project started in January 2008 – 3 year duration

Close collaboration with TREL Cambridge Lab.

Common development code-base – extended HTK

Common evaluation sets

Builds on current (and previous) PhD studentships

Monthly joint meetings

Project Overview

http://mi.eng.cam.ac.uk/~mjfg/REAP/index.html

See

15


Approach model compensation

Approach – Model Compensation

  • Model compensation schemes highly effective BUT

    • Slow compared to feature compensation scheme

  • Need schemes to improve speedwhile maintaining performance

    • Also automatically detect/track changing noise conditions

16


8 toshiba cued phd collaborations

To date 5 Research studentships (partly) funded by Toshiba

Shared software - code transfer both directions

Shared data sets - both (emotional) synthesis and ASR

6 monthly reports and review meetings

Students and topics

Hank Liao (2003-2007):Uncertainty decoding for Noise Robust ASR

Catherine Breslin (2004-2008):Complementary System Generation and Combination

Zeynep Inanoglu (2004-2008):Recognition and Synthesis of Emotion

Rogier van Dalen (2007-2010): Noise Robust ASR

Stuart Moore (2007-2010): Number Sense Disambiguation

Very useful and successful collaboration

8. Toshiba-CUED PhD Collaborations

17


9 htk version 3 0 development

http://htk.eng.cam.ac.uk

See

9. HTK Version 3.0 Development

  • HTK is a free software toolkit for developing HMM-based systems

    • 1000’s of users worldwide

    • widely used for research by universities and industry

1989 – 1992

1993 – 1999

2000 – date

V1.0 – 1.4

V1.5 – 2.3

V3.0 – V3.4

Initial development at CUED

Commercial development by Entropic

Academic development at CUED

  • Development partly funded by Microsoft and DARPA EARS Project

  • Primary dissemination route for CU research output

2004 - date: the ATK Real-time HTK-based recognition system

18


10 summary

http://mi.eng.cam.ac.uk/research/speech

See

10. Summary

  • Speech Group works on many aspects of speech processing

    • Large vocabulary speech recognition

    • Statistical machine translation

    • Statistical dialogue systems

    • Speech synthesis and voice conversion

  • Statistical machine learning approach to all applications

  • World-wide reputation for research

    • CUED systems have defined state-of-the-art for the past decade

    • Developed a number of techniques widely used by industry

  • Hidden Markov Model Toolkit (HTK)

    • Freely-available software, 1000’s of users worldwide

    • State-of-the –art features (discriminative training, adaptation …)

    • HMM Synthesis extension (HTS) from Nagoya Institute of Technology

19


  • Login