Student simulation and evaluation
Download
1 / 30

Student simulation and evaluation - PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on

Student simulation and evaluation . DOD meeting Hua Ai (hua@cs.pitt.edu) 03/03/2006. Outline. Motivations Backgrounds Corpus Student Simulation Model Comparisons Conclusions & Future Work. Motivations. For larger corpus

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Student simulation and evaluation' - kuper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Student simulation and evaluation

Student simulation and evaluation

DOD meeting

Hua Ai (hua@cs.pitt.edu)

03/03/2006


Outline
Outline

  • Motivations

  • Backgrounds

  • Corpus

  • Student Simulation Model

  • Comparisons

  • Conclusions & Future Work


Motivations
Motivations

  • For larger corpus

    • Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue systems automatically

    • Best strategy may often not even be present in small dataset

  • For cheaper corpus

    • Human subjects are expensive


Strategy learning using a simulated user schatzmann et al 2005

Dialog Manager

Simulated User

Reinforcement Learning

Strategy

Dialog

Corpus

Simulation models

Strategy learning using a simulated user (Schatzmann et al., 2005)


Backgrounds 1
Backgrounds (1)

  • Education community

    • Focusing on changes of student’s inner-brain knowledge representation forms

    • Usually not dialogue based

    • Simulated students for (Venlehn et al., 1994)

      • tutor training

      • Collaborative learning


Backgrounds 2
Backgrounds (2)

  • Dialogue community

    • Focusing on interactions and dialogue behaviors

    • Simulated users have limited actions to take

    • (Schatzmann et al., 2005)

      • Simulating on DA level


Corpus 1
Corpus (1)

  • Spoken dialogue physics tutor (ITSPOKE)


Corpus 2

(T) Question

(T) Question

(S) Answer

(S) Answer

Dialogue

(T) Q

(S) A

Dialogue

(T) Q

(S) A

Essay revision

Essay revision

Dialogue

Dialogue

Corpus (2)

5 problems

  • Tutoring procedure

… …


Corpus 3
Corpus (3)

  • Tutor’s behaviors

    • Defined in KCD (Knowledge Construction Dialogues)

Correct

Incorrect/

Partially Correct


Corpus 4
Corpus (4)

f03:s05 Different groups of subjects


Simulation models 1
Simulation Models (1)

  • Simulating on word level

    • Student’s have more complex behaviors

    • DA info alone isn’t enough for the system

  • Two models trained on two corpus

03ProbCorrect

ProbCorrect

f03

03Random

05ProbCorrect

Random

s05

05Random


Simulation models 2
Simulation Models (2)

  • ProbCorrect Model

    • Simulates average knowledge level of real students

    • Simulate meaningful dialogue behaviors

  • Random Model

    • Non-sense

    • As a contrast


Student simulation and evaluation

Real corpus

question1

Answer1_1 (c)

Answer1_2 (ic)

Answer1_3 (ic)

question2

Answer2_1 (c)

Answer2_2 (ic)

Candidate Ans:

For question1

c:ic = 1:2

c:

Answer1_1

ic:

Answer1_2

Answer1_3

For question2

c:ic = 1:1

c:

Answer2_1

ic

Answer2_2

  • ProbCorrect Model:

  • Question 1

  • Answer:

  • Choose to give a c/ic answer with the same average probability as real student

  • Randomly choose one answers from the corresponding answer set

ProbCorrect Model


Student simulation and evaluation

HC03&05

Question1

Answer1_1

Answer1_2

Answer1_3

Answer1_4

Question2

Answer2_1

Answer2_2

Candidate Ans:

1) Answer1_1

2) Answer1_2

3) Answer1_3

4) Answer1_4

5) Answer2_1

6) Answer2_2

Big random Model:

Question i:

Answer: any of the 6 answers with the same probability

(Regardless the question!)

Random Model


Experiments
Experiments

  • Comparisons between real corpora

  • Comparisons between real & simulated corpora

  • Comparisons between simulated corpora


Real corpora comparisons 1
Real Corpora Comparisons (1)

  • Evaluation metrics

    • High-level dialog features

    • Dialog style and cooperativeness

    • Dialog Success Rate and Efficiency

    • Learning Gains


Real corpora comparisons 2
Real corpora comparisons (2)

  • High-level dialog features


Real corpora comparisons 3
Real corpora comparisons (3)

  • Dialogue style features


Real corpora comparisons 31
Real corpora comparisons (3)

  • Dialogue success rate


Real corpora comparisons 4
Real corpora comparisons (4)

  • Learning gains features


Results
Results

  • Differences captured by these simple metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., 2005)

  • Differences could be due to different user population



Results 1
Results (1)

  • Most of the measurements are able to distinguish between Random and ProbCorrect model

  • ProbCorrect model generates more realistic behaviors

  • We can’t conclude on the power of these metrics since the two simulated corpus are really different


Results 2
Results (2)

  • Differences between real and random models are captured clearly, but differences between real and ProbCorrect is not clear

  • We don’t expect this simple model to give very real corpus. It’s surprising that the differences are small


Results 3
Results (3)

  • S05 variety > f03 variety  05probCorrect variety > 03probCorrect variety

  • However, we don’t get significantly more varieties in the simulated corpus than the real ones

    • Could be the computer tutor is simple (c/ic)

    • We’re using the same candidate answer set


Results 4
Results (4)

  • ProbCorrect models trained on different real corpora are quite different

  • The ProbCorrect model is more similar to the real corpus it is trained from than to the other real corpus



Results1
Results dialogue structure

  • Larger differences between the two simulated corpora in prob7 than in prob34

  • Dialogue structure of prob34 is more restricted

  • The power of these simple metrics is restricted by the dialogue structure


Conclusions
Conclusions dialogue structure

  • The simple measurements can distinguish between

    • real corpora

      • Different population

    • simulated and real corpora

      • To different extent

    • simulated corpora

      • Different models

      • Trained on different corpora

      • Limited to different Dialog structure


Future work
Future work dialogue structure

  • Explore “deep” evaluation metrics

  • Test simulated corpus on policy

  • More simulation models

    • More human features

      • Emotion, learning

    • Special cases

      • Quick learners, slow learners