Feb 21 2006 rohit kumar affective dialog systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Feb.21.2006 Rohit Kumar Affective Dialog Systems PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Compensating for Hyperarticulation by Modeling Articulatory Properties Hagen Soltau, Florian Metze, Alex Waibel Interactions between Speech Recognition Problems and User Emotions Mihai Rotaru, Diane J. Litman, Kate Forbes-Riley. Feb.21.2006 Rohit Kumar Affective Dialog Systems.

Download Presentation

Feb.21.2006 Rohit Kumar Affective Dialog Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Feb 21 2006 rohit kumar affective dialog systems

Compensating for Hyperarticulationby Modeling Articulatory PropertiesHagen Soltau, Florian Metze, Alex WaibelInteractions betweenSpeech Recognition Problemsand User EmotionsMihai Rotaru, Diane J. Litman, Kate Forbes-Riley

Feb.21.2006

Rohit Kumar

Affective Dialog Systems


Feb 21 2006 rohit kumar affective dialog systems

(Affective Computing)

User Centered Computing

Audience Centered Presentation

  

     

   


Queries concerns

Queries & Concerns

  •  What are Articulatory Features ?

    • Large conflicts in enumeration of these features 

  •  Use of Articulatory Features to detect Emotions

  • Training data for Hyperarticulation models

    • Use of Isolated words 

    • No Annotation of Hyperarticulation 

    • Methodology of data collection 

    • Task Specific, …   


Queries concerns1

Queries & Concerns

  •  Humans use Hyperarticulation to recover from error in HH interaction while Hyperarticulation is a source of error in HC interaction. Why ???

    • Lots of big Questions

      • Should we make Human like ASRs ?

      • Can we ?

      • What is different ?

  •  Gaussian Mixture Models (GMM)

  • No Significance Numbers of WERs


Queries concerns2

Queries & Concerns

  • Applicability test of Chi – Square

  •   Hypothesis to explain lack of dependancies where it is expected

    • Users more forgiving in Tutorial Dialog (higher tolerance to error)

    • May be due to Conflation of Emotions

      • Separate out +ves and -ves

    • Due to YES/NO turns after semantic misrecognition

      • Difficult to capture emotion in Yes/No

      • Better recognition to not reject


Feb 21 2006 rohit kumar affective dialog systems

But before we turn into “Self” Centered Maniacs

Lets look at what

Soltau and Rotaru have to say


What are these papers about

What are these papers about

Both these papers are about

  • Automatic (& Human) Speech Recognition

  • Error Handling Strategies in Spoken Dialog

  • Interaction between Affect and Misrecognitions by ASR


Soltau et al

Soltau et. al.

  • Suggest that Articulatory Features to be used to improve performance of ASR in Hyperarticulated speech

    • Assumption: People don’t substitute whole phone to contrast a previous recognition error

    • Basically, more precise modeling of whats being hyperarticulated

  • How did they do it ?

    • Besides what HMM based ASRs usually do

    • Trained additional GMMs for Articulatory features

      (and also anti-models   )

    • Get probability scores (from the GMMs) for the Articulatory Features

    • Linearly combine (with different weights) the scores from all the models

    • Get better hypothesis (just like “get more minutes”)


Soltau et al continued

Soltau et. al. (continued)

(Add in if I am missing something)

  • Methodology

    • Acoustic Models

      • Feature Extraction (MFCC + Context reduced to 40 features by LDA transform)

      • Other front end processing

    • AF Models

      • Same front end

      • GMMs (48 per feature) trained on middle state time alignments

    • Data collection for Hyperarticulated speech

      • 2 Sessions: Normal / Induced Hyperarticulated

      • Simulated Recognition Errors

      • Subjects 45


Soltau et al continued1

Soltau et. al. (continued)

  • Various Experiments

    • Classification of Articulatory Features

    • Decoding with Adapted Acoustic Models + AF

    • Decoding with Specialized models + AF


Rotaru et al

Rotaru et. al.

  • Domain: Spoken Tutorial Dialog

  • Chaining Effect of misrecognition across turns

  • Recognition Problems & Emotions in student turns


Rotaru et al continued

Rotaru et. al. (continued)

  • Methodology

    • ITSPOKE Corpus + Emotion Annotation

    • Student Utterances annotated by

      • ASR Misrecognitions

      • Rejections

      • Semantic Misrecognition

      • Student Emotion

      • Emotion Source


Rotaru et al continued1

Rotaru et. al. (continued)

  • Chi-Square Analysis

    • Rejection in previous turn vs. Rejection in current turn

    • ASR Mis. in previous turn vs. ASR Mis. in current turn

    • ASR Mis. in previous turn vs. Rejection in current turn

    • Rejection in previous turn vs. Emotion in current turn

    • Rejection in previous turn vs. Emotion Src. in current turn

    • Sem. Mis. in previous turn vs. Emotion in current turn

    • Emotion in previous turn vs. (ASR) Mis. in current turn

    • Emotion in current turn vs. (ASR) Mis. in current turn


Articulatory features

Articulatory Features

  • Speech Production Mechanism


Articulatory features1

Articulatory Features

  • Vowels

    • Vowel Height

      • High, Mid, Low

    • Vowel Backwardness

      • Front, Mid, Back

    • Long / Short Vowel

    • Dipthong

    • Schwa

    • Lip Rounding (+/-)

    • Voicing !

    • Oral / Nasal


Articulatory features2

Articulatory Features

  • Consonant

    • Place of Articulation

      • Labial, Alveolar, Palatal, Labio-Dental, Dental, Velar, Glottal, {Retroflex}

    • Manner of Articulation

      • Stop, Fricative, Affricative, Nasal, Lateral, Approximant, {Liquids, Semivowels}

    • Voicing (+/-)

Rohit Kumar, Amit Kataria, Sanjeev Sofat, "Building Non - Native Pronunciation Lexicon for English using a Rule based Approach," International Conference on Natural Language Processing (ICON) 2003, Mysore, India

http://en.wikipedia.org/wiki/Articulatory_phonetics


Feb 21 2006 rohit kumar affective dialog systems

  • Use of Articulatory Features to detect Emotions


Feb 21 2006 rohit kumar affective dialog systems

  • Training data for Hyperarticulation models

    • Use of Isolated words

    • No Annotation of Hyperarticulation

    • Methodology of data collection

    • Task Specific, …


Feb 21 2006 rohit kumar affective dialog systems

  • Humans use Hyperarticulation to recover from error in HH interaction while Hyperarticulation is a source of error in HC interaction. Why ???

    • Lots of big Questions

      • Should we make Human like ASRs ?

      • Could we ? Would we ?

      • What is different ?


Gaussian mixture models

Gaussian Mixture Models

Andrew Moore’s Lecture Slides

Pg 7 - 10, 20 - 24

http://www.autonlab.org/tutorials/gmm.html


Feb 21 2006 rohit kumar affective dialog systems

  • No Significance Numbers of WERs


Applicability test of chi 2

Applicability Test of (Chi)2

The following minimum frequency thresholds should be obeyed:

  • for a 1 X 2 or 2 X 2 table, expected frequencies in each cell should be at least 5

  • for a 2 X 3 table, expected frequencies should be at least 2

  • for a 2 X 4 or 3 X 3 or larger table, if all expected frequencies but one are at least 5 and if the one small cell is at least 1, chi-square is still a good approximation

    In general, the greater the degrees of freedom (i.e., the more values/categories on the independent and dependent variables), the more lenient the minimum expected frequencies threshold.

    http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html


Feb 21 2006 rohit kumar affective dialog systems

  • Hypothesis to explain lack of dependencies where it is expected

    • Users more forgiving in Tutorial Dialog (higher tolerance to error)

    • May be due to Conflation of Emotions

      • Separate out +ves and -ves

    • Due to YES/NO turns after semantic misrecognition

      • Difficult to capture emotion in Yes/No

      • Better recognition to not reject


Feb 21 2006 rohit kumar affective dialog systems

That’s all Folks

Unless you have something to say ?!


  • Login