Scill spoken conversational interaction for language learning
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

SCILL: Spoken Conversational Interaction for Language Learning PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

SCILL: Spoken Conversational Interaction for Language Learning . Stephanie Seneff ([email protected]) Jim Glass ([email protected]) Spoken Language Systems Group MIT Computer Science and Artificial Intelligence Lab Steve Young ([email protected]) Speech Group

Download Presentation

SCILL: Spoken Conversational Interaction for Language Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Scill spoken conversational interaction for language learning

SCILL: Spoken Conversational Interaction for Language Learning

Stephanie Seneff ([email protected])

Jim Glass ([email protected])

Spoken Language Systems Group

MIT Computer Science and Artificial Intelligence Lab

Steve Young ([email protected])

Speech Group

CUED Machine Intelligence Lab


Conversational interfaces

Language

Generation

Speech

Synthesis

Dialogue

Management

Audio

Database

Speech

Recognition

Context

Resolution

Language Understanding

Conversational Interfaces


Conversational interfaces1

Hub

Galaxy

Architecture

Conversational Interfaces

Language

Generation

Speech

Synthesis

Dialogue

Management

Audio

Database

Speech

Recognition

Context

Resolution

Language Understanding


Bilingual weather domain video clip

Bilingual Weather Domain: Video Clip


Computer aids through conversational interaction

Computer Aids through Conversational Interaction

  • Language teachers have limited time to interact with students in dialogue exchanges

  • Computers provide non-threatening environment in which to practice communicating

  • Three-phase interaction framework is envisioned:

    • Preparation: practice phrases, simulated dialogues

    • Conversational Interaction

      • Telephone conversation with graphical support

      • Seamless translation aid

    • Assessment

      • Review dialog interaction

      • Feedback and fluency scores


Scill a spoken computer interface for language learning

Domain Expert

Tutor

MIT SLS

Bilingual Conversational

Dialogue Systems

CU Speech Group

Speech Recognition and

Pronunciation Scoring

SCILL: A Spoken Computer Interface for Language Learning

Conversational systems for interactive environment for language learning

Speaks only target language.

Has access to information sources.

Can provide translations for both user queries and system responses


Technology requirements

Technology Requirements

  • Robust recognition and understanding of foreign-accented speech

    • If recognition is too poor, student may become frustrated

    • Customize vocabulary and linguistic constructs to lesson plans

  • High quality cross-lingual language generation

  • Natural and fluent speech synthesis

  • Ability to automatically generate simulated dialogues

    • System should be able to generate multiple dialogues based on a given lesson topic on the fly

    • Allows the student to see example sentence constructs for a particular lesson

  • Ability to reconfigure quickly and easily to new lessons

  • Automatic scoring for fluency, pronunciation, tone quality, use of vocabulary, etc.


Scill system overview

WEB

SERVER

USER INTERFACE

SCILL System Overview


Bilingual spoken dialogue interaction current status

Bilingual Spoken Dialogue Interaction: Current Status

  • Initial version of end-to-end system is in place for the weather domain

    • Rain, snow, wind, temperature, warnings (e.g., tornado), etc.

  • MIT Recognizer supports both English and Mandarin

    • Seamless language switching

  • English queries are translated into Mandarin

  • Mandarin queries are answered in Mandarin

    • User can ask for a translation into English of the response at any time

  • Currently using off-the-shelf Mandarin synthesizer from ITRI

    • Plan to develop high quality domain-dependent Mandarin synthesis using our Envoice tools

  • System can be configured as telephone-only or as telephone augmented with a Web-based GUI interface


Bilingual recognizer construction

Interlingua

Parse

Generate

Chinese corpus

Recognizer

English Recognizer Language Model

Chinese Recognizer Language Model

English

Network

Chinese

Network

Bilingual Recognizer Construction

English corpus

Create Mandarin corpus by automatically translating existing English corpus

Automatically induce language model for both English and Mandarin recognizers using NL grammar

Two recognizers compete in common search space


Htk mandarin speech recognizer

Standard HTK LVCSR Setup:

  • PLP Front-end with 1st/2nd/3rd Derivatives transformed using HLDA

  • 3 state cross-word hidden Markov models

  • Decision tree clustered context dependent triphones

  • N-gram language model smoothed with class-based language model

HTK Mandarin Speech Recognizer

Except:

  • Standard PLP front-end augmented with F0+derivatives (F0 added after HLDA transformation)

  • 46 phone acoustic model set with long final phones split eg uang -> ua ng

  • Questions about tone added to decision tree context clustering


Hmm based pronunciation scoring

A simple approximation

sh

ih

d

ax

. . .

Good

Bad

Expert Rankings

Good

Bad

P(p | A)

Relates confidence scores to human perception

HMM-Based Pronunciation Scoring

Basic approach:

  • estimate posterior probabilities (ie confidence score) of each phone or syllable given acoustics

  • map confidence scores to good/bad decision using data labelled by experts


Multilingual translation framework

English

Chinese

Spanish

Japanese

Recognition

Models

NLU

Generation

Rules

Semantic

Frame

NLG

Parsing

Rules

English

Chinese

Spanish

Japanese

Synthesis

Speech

Corpora

Multilingual Translation Framework

Common meaning representation: semantic frame


Content understanding and translation

rain/storm

clause: weather_event

topic: precip_act, name: thunderstorm, num: pl

quantifier: some

pred: accompanied_by

adverb: possibly

topic: wind, num: pl, pred: gusty

and: precip_act, name: hail

weather

wind

hail

Japanese:

Spanish:Algunas tormentas posiblement acompanadas por vientos racheados y granizo

Chinese:¤@ ¨Ç ¹p «B ¥i ¯à ·| ¦ñ ¦³ °} ­· ©M ¦B ¹r

Content Understanding and Translation

English:Some thunderstorms may be accompanied by gusty winds and hail

Frame indexed under weather, wind, rain, storm, and hail


Audio demonstration

Audio Demonstration

  • User asks: “Will it rain tomorrow in Boston?”

  • System paraphrases query, then responds in Chinese

  • “Please repeat that” in English or Chinese interpreted identically

  • System repeats response in Chinese

  • User speaks query in English: seamless language switching

  • System paraphrases, then translates query into Chinese

  • User attempts to repeat translation

    • Recognition error: hallucinates an erroneous date (February 30) which will be remembered

  • System supplies known cities in England

  • User chooses London

  • System has no weather for London on February 30

  • User asks “how about today?”

  • System provides London’s weather today

  • User asks for a translation into English, which is provided


Proposed translation procedure

{c eform

:attribute “name”

:person “you” }

Key-value Representation

generate

generate

Linguistic Frame

transfer

Linguistic Frame

generate

parse

parse

English query

Chinese query

Proposed Translation Procedure

{c wh_question

:topic {q name }

:pro “you”

:verb “call”

:complement {q object :trace “what” }

{c wh_question

:topic {q name

:poss “you” }

:auxil “link”

:complement {q object :trace “what” }

If generated query fails to parse,

simplify interlingua and generation

“what is your name”

“ni3 jiao4 shen2_me5 ming2_zi4”


Proposed exercise using typed inputs

Type-in Window

Input:

Reply Window

Query:

Response:

Proposed Exercise using Typed Inputs

Input: Da2 la2 si4 hui4 xia4 yu3 ming2 tian1 ma5?

System is able to parse query in spite of tone errors and (limited) syntax errors

Next: Los Angeles wind Saturday

Next: Dallas rain tomorrow

Query: Da2 la1 si1ming2 tian1 hui4 xia4 yu3 ma5?

System color codes errors in tone and in syntactic constructs

Response: Da2 la1 si1 ming2 tian1 xia4 wu3 xia4 te4 da4 yu3


Testing the effectiveness of training on typed input proposed measures

Testing the Effectiveness of Training on Typed Input: Proposed Measures

  • Compare the quality of spoken dialogue recorded before and after a Web-based training session

  • Measures of fluency:

    • Syntactic well-formedness

    • Tone production accuracy

    • Frequency of pauses, edits, and filler words

    • Phonetic quality , etc.

  • Measures of communication success:

    • Frequency of usage of translation assistance

    • Understanding error rate

    • Task completion

    • Time to completion, etc.


Technology goal automated language understanding

parse

Interlingual Representation

generate

Mandarin Sentence

Grammar Induction

Mandarin Parsing Grammar

Technology Goal: Automated Language Understanding

Once translation ability exists from English to target language, can create reverse system almost effortlessly

English Sentence

Corpus Pairs

Utilizes English parse tree and Mandarin generation lexicon to induce Mandarin parse tree


Building nxn translation efficiently

English

Interlingua

Interlingua

Building NxN Translation Efficiently

Japanese

Mandarin

Arabic

French

Spanish

Urdu

Korean

Automatic Grammar Induction


Future plans near term and long term

Future Plans (Near Term and Long Term)

  • Install current version of system at Cambridge University

  • Incorporate CU Mandarin recognizer

  • Add support for audio input at the computer

  • Build high quality synthesis capability

  • Improve understanding, dialogue, and translation performance

  • Collect and transcribe data from language learners and assess both system and students

  • Develop various scoring algorithms for student fluency

  • Refine all aspects of system based on collected data


  • Login