feb 21 2006 rohit kumar affective dialog systems
Download
Skip this Video
Download Presentation
Feb.21.2006 Rohit Kumar Affective Dialog Systems

Loading in 2 Seconds...

play fullscreen
1 / 24

Feb.21.2006 Rohit Kumar Affective Dialog Systems - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Compensating for Hyperarticulation by Modeling Articulatory Properties Hagen Soltau, Florian Metze, Alex Waibel Interactions between Speech Recognition Problems and User Emotions Mihai Rotaru, Diane J. Litman, Kate Forbes-Riley. Feb.21.2006 Rohit Kumar Affective Dialog Systems.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Feb.21.2006 Rohit Kumar Affective Dialog Systems' - oren-garcia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
feb 21 2006 rohit kumar affective dialog systems

Compensating for Hyperarticulationby Modeling Articulatory PropertiesHagen Soltau, Florian Metze, Alex WaibelInteractions betweenSpeech Recognition Problemsand User EmotionsMihai Rotaru, Diane J. Litman, Kate Forbes-Riley

Feb.21.2006

Rohit Kumar

Affective Dialog Systems

slide2
(Affective Computing)

User Centered Computing

Audience Centered Presentation

  

     

   

queries concerns
Queries & Concerns
  •  What are Articulatory Features ?
    • Large conflicts in enumeration of these features 
  •  Use of Articulatory Features to detect Emotions
  • Training data for Hyperarticulation models
    • Use of Isolated words 
    • No Annotation of Hyperarticulation 
    • Methodology of data collection 
    • Task Specific, …   
queries concerns1
Queries & Concerns
  •  Humans use Hyperarticulation to recover from error in HH interaction while Hyperarticulation is a source of error in HC interaction. Why ???
    • Lots of big Questions
      • Should we make Human like ASRs ?
      • Can we ?
      • What is different ?
  •  Gaussian Mixture Models (GMM)
  • No Significance Numbers of WERs
queries concerns2
Queries & Concerns
  • Applicability test of Chi – Square
  •   Hypothesis to explain lack of dependancies where it is expected
    • Users more forgiving in Tutorial Dialog (higher tolerance to error)
    • May be due to Conflation of Emotions
      • Separate out +ves and -ves
    • Due to YES/NO turns after semantic misrecognition
      • Difficult to capture emotion in Yes/No
      • Better recognition to not reject
slide6
But before we turn into “Self” Centered Maniacs

Lets look at what

Soltau and Rotaru have to say

what are these papers about
What are these papers about

Both these papers are about

  • Automatic (& Human) Speech Recognition
  • Error Handling Strategies in Spoken Dialog
  • Interaction between Affect and Misrecognitions by ASR
soltau et al
Soltau et. al.
  • Suggest that Articulatory Features to be used to improve performance of ASR in Hyperarticulated speech
    • Assumption: People don’t substitute whole phone to contrast a previous recognition error
    • Basically, more precise modeling of whats being hyperarticulated
  • How did they do it ?
    • Besides what HMM based ASRs usually do
    • Trained additional GMMs for Articulatory features

(and also anti-models   )

    • Get probability scores (from the GMMs) for the Articulatory Features
    • Linearly combine (with different weights) the scores from all the models
    • Get better hypothesis (just like “get more minutes”)
soltau et al continued
Soltau et. al. (continued)

(Add in if I am missing something)

  • Methodology
    • Acoustic Models
      • Feature Extraction (MFCC + Context reduced to 40 features by LDA transform)
      • Other front end processing
    • AF Models
      • Same front end
      • GMMs (48 per feature) trained on middle state time alignments
    • Data collection for Hyperarticulated speech
      • 2 Sessions: Normal / Induced Hyperarticulated
      • Simulated Recognition Errors
      • Subjects 45
soltau et al continued1
Soltau et. al. (continued)
  • Various Experiments
    • Classification of Articulatory Features
    • Decoding with Adapted Acoustic Models + AF
    • Decoding with Specialized models + AF
rotaru et al
Rotaru et. al.
  • Domain: Spoken Tutorial Dialog
  • Chaining Effect of misrecognition across turns
  • Recognition Problems & Emotions in student turns
rotaru et al continued
Rotaru et. al. (continued)
  • Methodology
    • ITSPOKE Corpus + Emotion Annotation
    • Student Utterances annotated by
      • ASR Misrecognitions
      • Rejections
      • Semantic Misrecognition
      • Student Emotion
      • Emotion Source
rotaru et al continued1
Rotaru et. al. (continued)
  • Chi-Square Analysis
    • Rejection in previous turn vs. Rejection in current turn
    • ASR Mis. in previous turn vs. ASR Mis. in current turn
    • ASR Mis. in previous turn vs. Rejection in current turn
    • Rejection in previous turn vs. Emotion in current turn
    • Rejection in previous turn vs. Emotion Src. in current turn
    • Sem. Mis. in previous turn vs. Emotion in current turn
    • Emotion in previous turn vs. (ASR) Mis. in current turn
    • Emotion in current turn vs. (ASR) Mis. in current turn
articulatory features
Articulatory Features
  • Speech Production Mechanism
articulatory features1
Articulatory Features
  • Vowels
    • Vowel Height
      • High, Mid, Low
    • Vowel Backwardness
      • Front, Mid, Back
    • Long / Short Vowel
    • Dipthong
    • Schwa
    • Lip Rounding (+/-)
    • Voicing !
    • Oral / Nasal
articulatory features2
Articulatory Features
  • Consonant
    • Place of Articulation
      • Labial, Alveolar, Palatal, Labio-Dental, Dental, Velar, Glottal, {Retroflex}
    • Manner of Articulation
      • Stop, Fricative, Affricative, Nasal, Lateral, Approximant, {Liquids, Semivowels}
    • Voicing (+/-)

Rohit Kumar, Amit Kataria, Sanjeev Sofat, "Building Non - Native Pronunciation Lexicon for English using a Rule based Approach," International Conference on Natural Language Processing (ICON) 2003, Mysore, India

http://en.wikipedia.org/wiki/Articulatory_phonetics

slide18
Training data for Hyperarticulation models
    • Use of Isolated words
    • No Annotation of Hyperarticulation
    • Methodology of data collection
    • Task Specific, …
slide19
Humans use Hyperarticulation to recover from error in HH interaction while Hyperarticulation is a source of error in HC interaction. Why ???
    • Lots of big Questions
      • Should we make Human like ASRs ?
      • Could we ? Would we ?
      • What is different ?
gaussian mixture models
Gaussian Mixture Models

Andrew Moore’s Lecture Slides

Pg 7 - 10, 20 - 24

http://www.autonlab.org/tutorials/gmm.html

applicability test of chi 2
Applicability Test of (Chi)2

The following minimum frequency thresholds should be obeyed:

  • for a 1 X 2 or 2 X 2 table, expected frequencies in each cell should be at least 5
  • for a 2 X 3 table, expected frequencies should be at least 2
  • for a 2 X 4 or 3 X 3 or larger table, if all expected frequencies but one are at least 5 and if the one small cell is at least 1, chi-square is still a good approximation

In general, the greater the degrees of freedom (i.e., the more values/categories on the independent and dependent variables), the more lenient the minimum expected frequencies threshold.

http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html

slide23
Hypothesis to explain lack of dependencies where it is expected
    • Users more forgiving in Tutorial Dialog (higher tolerance to error)
    • May be due to Conflation of Emotions
      • Separate out +ves and -ves
    • Due to YES/NO turns after semantic misrecognition
      • Difficult to capture emotion in Yes/No
      • Better recognition to not reject
slide24
That’s all Folks

Unless you have something to say ?!

ad