1 / 30

Predicting Student Emotions in Computer-Human Tutoring Dialogues

Predicting Student Emotions in Computer-Human Tutoring Dialogues. Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA 15260 USA. Motivation. Bridge Learning Gap between Human Tutors and Computer Tutors

dakota
Download Presentation

Predicting Student Emotions in Computer-Human Tutoring Dialogues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA 15260 USA

  2. Motivation • Bridge Learning Gap between Human Tutors and Computer Tutors • Our Approach: Add emotion prediction and adaptation to ITSPOKE, our Intelligent Tutoring SPOKEn dialogue system

  3. Back-end is Why2-Atlas system (VanLehn et al., 2002) • Sphinx2 speech recognition and Cepstral text-to-speech

  4. Back-end is Why2-Atlas system (VanLehn et al., 2002) • Sphinx2 speech recognition and Cepstral text-to-speech

  5. Back-end is Why2-Atlas system (VanLehn et al., 2002) • Sphinx2 speech recognition and Cepstral text-to-speech

  6. Human-Computer Excerpt Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27: dammit(ASR: it is) Tutor28 : Could you please repeat that? Student29 : same(ASR: i same) Tutor30 : Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31: zero(ASR: the zero) Tutor32 : Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario <…omitted…> Student33: oh gravity you already said this(ASR: beats gravity you're exert less) Tutor34: Fine. Are there any other forces acting on the apple as it falls? Student35: no why are you doing this again(ASR: no y and to it yes) Tutor36: Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37: downward you computer(ASR: downward you computer)

  7. Outline Data and Emotion Annotation Machine Learning Experiments • extract linguistic features from student turns • use different feature sets to predict emotions • 19-36% relative reduction of baseline error • comparison with human tutoring

  8. ITSPOKE Dialogue Corpus • 100 spoken tutoring dialogues (physics problems) with ITSPOKE • on average, 19.4 minutes and 25 student turns • 20 subjects • university students who have never taken college physics and who are native speakers

  9. Emotion Annotation Scheme (Sigdial’04) • ‘Emotion’: emotions/attitudes that may impact learning • Annotation of Student Turns • Emotion Classes negative e.g. uncertain, bored, irritated, confused, sad positive e.g. confident, enthusiastic neutral no weak or strong expression of negative or positive emotion

  10. Example Annotated Excerpt ITSPOKE: What happens to the velocity of a body when there is no force acting on it? Student: dammit (NEGATIVE) ASR: it is ITSPOKE : Could you please repeat that? Student: same (NEUTRAL) ASR: i same

  11. Agreement Study • 333 student turns, 15 dialogues • 2 annotators (the authors)

  12. Emotion Classification Tasks • Negative, Neutral, Positive • Kappa = .4, Weighted Kappa = .5 • Focus of this talk • Negative, Non-Negative • Kappa = .5 • Emotional, Non-Emotional • Kappa = .3 • Results on par with prior research • Kappas of .32-.48 in (Ang et al. 2002; Narayanan 2002; Shafran et al. 2003)

  13. Feature Extraction per Student Turn • Three feature types • Acoustic-prosodic • Lexical • Identifiers • Research questions • Relative utility of acoustic-prosodic, lexical and identifier features • Impact of speech recognition • Comparison with human tutoring (HLT/NAACL, 2004)

  14. Feature Types (1) Acoustic-Prosodic Features • 4 pitch (f0) : max, min, mean, standard dev. • 4 energy (RMS) : max, min, mean, standard dev. • 4 temporal: turn duration (seconds) pause length preceding turn (seconds) tempo (syllables/second) internal silence in turn (zero f0 frames)  available to ITSPOKE in real time

  15. Feature Types (2) Word Occurrence Vectors • Human-transcribed lexical items in the turn • ITSPOKE-recognized lexical items

  16. Feature Types (3) Identifier Features • student id • student gender • problem id

  17. Machine Learning Experiments • Weka software: Boosted decision trees • gave best results in pilot studies (ASRU 2003) • Baseline: Majority class (neutral) • Methodology: 10 runs of 10-fold cross validation • Evaluation Metric: Accuracy • Datasets: • Agreed (202/333 turns where annotators agreed) • Consensus (all 333 turns after annotators resolved disagreements)

  18. Acoustic-Prosodic vs. Lexical Features(Agreed Turns) • Both acoustic-prosodic (“speech”) and lexical features significantly outperform the majority baseline • Combining feature types yields an even higher accuracy • Baseline = 46.52%

  19. Adding Identifier Features (Agreed Turns) • Adding identifier features improves all results • With identifier features, lexical information now yields the highest accuracy • Baseline = 46.52%

  20. Using Automatic Speech Recognition (Agreed Turns) • Surprisingly, using ASR output rather than human transcriptions does not particularly degrade accuracy • Baseline = 46.52%

  21. Summary of Results(Agreed Turns)

  22. Summary of Results(Consensus Turns) - Using consensus rather than agreed data decreases predictive accuracy for all feature sets, but other observations generally hold

  23. Comparison with Human Tutoring (Agreed Turns) - In human tutoring dialogues, emotion prediction (and annotation) is more accurate and based on somewhat different features

  24. Related Research in Emotional Speech • Elicited Speech (Polzin & Waibel 1998; Oudeyer 2002; Liscombe et al. 2003) • Naturally-Occurring Speech (Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003) • Our Work • naturally-occurring tutoring data • analysis of comparable human and computer corpora

  25. Current Directions • Develop adaptive strategies for ITSPOKE • annotate human tutor turns • evaluate ITSPOKE with emotion adaptation • Co-training to address annotation bottleneck • Maeireizo, Litman, and Hwa: Saturday poster

  26. Summary • Recognition of annotated student emotions in spoken computer tutoring dialogues • Feature sets containing acoustic-prosodic, lexical, and/or identifier features yield significant improvements in predictive accuracy compared to majority class baselines • role of differing feature types and speech recognition errors • comparable analysis of human tutoring dialogues • paper contains details regarding two other emotion prediction tasks • This research is a first step towards implementing emotion prediction and adaptation in ITSPOKE

  27. Thank You! Questions?

  28. Example Annotated Excerpt ITSPOKE: What else do you need to know to find the box's acceleration? Student: the direction (NEGATIVE, UNCERTAIN) ASR: add directions ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force (POSITIVE, CONFIDENT) ASR: force ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity (NEGATIVE, UNCERTAIN) ASR: velocity ITSPOKE : Could you please repeat that? Student: velocity (NEGATIVE, IRRITATED) ASR: velocity

  29. Prior Research: Affective Computer Tutoring (Kort and Reilly and Picard., 2001): propose a cyclical model of emotion change during learning; developing a non-dialog computer tutor that will use eye-tracking/facial features to predict emotion and support movement into positive emotions. (Aist and Kort and Reilly and Mostow and Picard, 2002): Adding human-provided emotional scaffolding to an automated reading tutor increases student persistence (Evens et al, 2002): for CIRCSIM: computer dialog tutor for physiology problems; hypothesize adaptive strategies for recognized student emotional states; e.g. if detecting frustration, system should respond to hedges and self-deprecation by supplying praise and restructuring the problem. (de Vicente and Pain, 2002): use human observation about student motivational states in videod interaction with non-dialog computer tutor to develop rules for detection (Ward and Tsukahara, 2003): spoken dialog computer “tutor-support” uses prosodic and contextual features of user turn (e.g. “on a roll”, “lively”, “in trouble”) to infer appropriate response as users remember train stations. Preferred over randomly chosen acknowledgments (e.g. “yes”, “right” “that’s it”, “that’s it <echo>”,… ) (Conati and Zhou, 2004): use Dynamic Bayesian Networks) to reason under uncertainty about abstracted student knowledge and emotional states through time, based on student moves in non-dialog computer game, and to guide selection of “tutor” responses.  Most will be relevant to developing ITSPOKE adaptation techniques

  30. Experimental Procedure • Students take a physics pretest • Students read background material • Students use the web and voice interface to work through up to 10 problems with either ITSPOKE or a human tutor • Students take a post-test

More Related