student simulation and evaluation
Download
Skip this Video
Download Presentation
Student simulation and evaluation

Loading in 2 Seconds...

play fullscreen
1 / 30

Student simulation and evaluation - PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on

Student simulation and evaluation . DOD meeting Hua Ai ([email protected]) 03/03/2006. Outline. Motivations Backgrounds Corpus Student Simulation Model Comparisons Conclusions & Future Work. Motivations. For larger corpus

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Student simulation and evaluation ' - kuper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Motivations
  • Backgrounds
  • Corpus
  • Student Simulation Model
  • Comparisons
  • Conclusions & Future Work
motivations
Motivations
  • For larger corpus
    • Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue systems automatically
    • Best strategy may often not even be present in small dataset
  • For cheaper corpus
    • Human subjects are expensive
strategy learning using a simulated user schatzmann et al 2005

Dialog Manager

Simulated User

Reinforcement Learning

Strategy

Dialog

Corpus

Simulation models

Strategy learning using a simulated user (Schatzmann et al., 2005)
backgrounds 1
Backgrounds (1)
  • Education community
    • Focusing on changes of student’s inner-brain knowledge representation forms
    • Usually not dialogue based
    • Simulated students for (Venlehn et al., 1994)
      • tutor training
      • Collaborative learning
backgrounds 2
Backgrounds (2)
  • Dialogue community
    • Focusing on interactions and dialogue behaviors
    • Simulated users have limited actions to take
    • (Schatzmann et al., 2005)
      • Simulating on DA level
corpus 1
Corpus (1)
  • Spoken dialogue physics tutor (ITSPOKE)
corpus 2

(T) Question

(T) Question

(S) Answer

(S) Answer

Dialogue

(T) Q

(S) A

Dialogue

(T) Q

(S) A

Essay revision

Essay revision

Dialogue

Dialogue

Corpus (2)

5 problems

  • Tutoring procedure

… …

corpus 3
Corpus (3)
  • Tutor’s behaviors
    • Defined in KCD (Knowledge Construction Dialogues)

Correct

Incorrect/

Partially Correct

corpus 4
Corpus (4)

f03:s05 Different groups of subjects

simulation models 1
Simulation Models (1)
  • Simulating on word level
    • Student’s have more complex behaviors
    • DA info alone isn’t enough for the system
  • Two models trained on two corpus

03ProbCorrect

ProbCorrect

f03

03Random

05ProbCorrect

Random

s05

05Random

simulation models 2
Simulation Models (2)
  • ProbCorrect Model
    • Simulates average knowledge level of real students
    • Simulate meaningful dialogue behaviors
  • Random Model
    • Non-sense
    • As a contrast
slide13

Real corpus

question1

Answer1_1 (c)

Answer1_2 (ic)

Answer1_3 (ic)

question2

Answer2_1 (c)

Answer2_2 (ic)

Candidate Ans:

For question1

c:ic = 1:2

c:

Answer1_1

ic:

Answer1_2

Answer1_3

For question2

c:ic = 1:1

c:

Answer2_1

ic

Answer2_2

  • ProbCorrect Model:
  • Question 1
  • Answer:
  • Choose to give a c/ic answer with the same average probability as real student
  • Randomly choose one answers from the corresponding answer set

ProbCorrect Model

slide14

HC03&05

Question1

Answer1_1

Answer1_2

Answer1_3

Answer1_4

Question2

Answer2_1

Answer2_2

Candidate Ans:

1) Answer1_1

2) Answer1_2

3) Answer1_3

4) Answer1_4

5) Answer2_1

6) Answer2_2

Big random Model:

Question i:

Answer: any of the 6 answers with the same probability

(Regardless the question!)

Random Model

experiments
Experiments
  • Comparisons between real corpora
  • Comparisons between real & simulated corpora
  • Comparisons between simulated corpora
real corpora comparisons 1
Real Corpora Comparisons (1)
  • Evaluation metrics
    • High-level dialog features
    • Dialog style and cooperativeness
    • Dialog Success Rate and Efficiency
    • Learning Gains
real corpora comparisons 2
Real corpora comparisons (2)
  • High-level dialog features
real corpora comparisons 3
Real corpora comparisons (3)
  • Dialogue style features
real corpora comparisons 31
Real corpora comparisons (3)
  • Dialogue success rate
real corpora comparisons 4
Real corpora comparisons (4)
  • Learning gains features
results
Results
  • Differences captured by these simple metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., 2005)
  • Differences could be due to different user population
results 1
Results (1)
  • Most of the measurements are able to distinguish between Random and ProbCorrect model
  • ProbCorrect model generates more realistic behaviors
  • We can’t conclude on the power of these metrics since the two simulated corpus are really different
results 2
Results (2)
  • Differences between real and random models are captured clearly, but differences between real and ProbCorrect is not clear
  • We don’t expect this simple model to give very real corpus. It’s surprising that the differences are small
results 3
Results (3)
  • S05 variety > f03 variety  05probCorrect variety > 03probCorrect variety
  • However, we don’t get significantly more varieties in the simulated corpus than the real ones
    • Could be the computer tutor is simple (c/ic)
    • We’re using the same candidate answer set
results 4
Results (4)
  • ProbCorrect models trained on different real corpora are quite different
  • The ProbCorrect model is more similar to the real corpus it is trained from than to the other real corpus
results1
Results
  • Larger differences between the two simulated corpora in prob7 than in prob34
  • Dialogue structure of prob34 is more restricted
  • The power of these simple metrics is restricted by the dialogue structure
conclusions
Conclusions
  • The simple measurements can distinguish between
    • real corpora
      • Different population
    • simulated and real corpora
      • To different extent
    • simulated corpora
      • Different models
      • Trained on different corpora
      • Limited to different Dialog structure
future work
Future work
  • Explore “deep” evaluation metrics
  • Test simulated corpus on policy
  • More simulation models
    • More human features
      • Emotion, learning
    • Special cases
      • Quick learners, slow learners
ad