1 / 29

A Machine Learning Approach for Automatic Student Model Discovery

A Machine Learning Approach for Automatic Student Model Discovery. Nan Li, N oboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University. Student Model. A set of knowledge components ( KCs )

shanon
Download Presentation

A Machine Learning Approach for Automatic Student Model Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Machine Learning Approach for Automatic Student Model Discovery Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University

  2. Student Model • A set of knowledge components (KCs) • Encoded in intelligent tutors to model how students solve problems • Example: What to do next on problems like 3x=12 • A key factor behind instructional decisions in automated tutoring systems

  3. Student Model Construction • Traditional Methods • Structured interviews • Think-aloud protocols • Rational analysis • Previous Automated Methods • Learning factor analysis (LFA) • Proposed Approach • Use a machine-learning agent, SimStudent, to acquire knowledge • 1 production rule acquired => 1 KC in student model (Q matrix) Require expert input. Highly subjective. Within the search space of human-provided factors. Independent of human-provided factors.

  4. A Brief Review of SimStudent • A machine-learning agent that • acquires production rules from • examples & problem solving experience • given a set of feature predicates & functions

  5. Production Rules • Skill divide (e.g. -3x = 6) • What: • Left side (-3x) • Right side (6) • When: • Left side (-3x) does not have constant term => • How: • Get-coefficient (-3) of left side (-3x) • Divide both sides with the coefficient • Each production rule is associated with one KC • Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step • Original model required strong domain-specific operators, like Get-coefficient  Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)

  6. Deep Feature Learning • Expert vs Novice (Chi et al., 1981) • Example: What’s the coefficient of -3x? • Expert uses deep functional features to reply -3 • Novice may use shallow perceptual features to reply 3 • Model deep feature learning using machine learning techniques • Integrate acquired knowledge into SimStudent learning • Remove dependence on strong operators & split KCs into finer grain sizes

  7. Feature Recognition asPCFG Induction • Underlying structure in the problem  Grammar • Feature  Non-terminal symbol in a grammar rule • Feature learning task  Grammar induction • Student errors Incorrect parsing

  8. Learning Problem • Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A probabilistic context free grammar (PCFG) • A non-terminal symbol in a grammar rule that represents target feature

  9. A Two-Step PCFG Learning Algorithm • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion • Creates non-terminal symbols for frequently occurred sequences • E.g. – and 3, SignedNumber and Variable • Viterbi Training Phase: • Refinesrule probabilities • Occur more frequently  Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

  10. Example of Production Rules Before and After integration • Extend the “What” Part in Production Rule • Original: • Skill divide (e.g. -3x = 6) • What: • Left side (-3x) • Right side (6) • When: • Left side (-3x) does not have constant term • => • How: • Get coefficient (-3) of left side (-3x) • Divide both sides with the coefficient (-3) • Extended: • Skill divide (e.g. -3x = 6) • What: • Left side (-3, -3x) • Right side (6) • When: • Left side (-3x) does not have constant term • => • How: • Get coefficient (-3) of left side (-3x) • Divide both sides with the coefficient (-3) • Fewer operators • Eliminate need for domain-specific operators

  11. Original: Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient (-3)

  12. Experiment Method • SimStudent vs. Human-generated model • Code real student data • 71 students used a Carnegie Learning Algebra I Tutor on equation solving • SimStudent: • Tutored by a Carnegie Learning Algebra I Tutor • Coded each step by the applicable production rule • Used human-generated coding in case of no applicable production • Human-generated model: • Coded manually based on expertise

  13. Human-generated vsSimStudentKCs

  14. How well two models fit with real student data • Used Additive Factor Model (AFM) • An instance of logistic regression that • Uses each student, each KC and KC by opportunity interaction as independent variables • To predict probabilities of a student making an error on a specific step

  15. An Example of Split in Division • Human-generated Model • divide: Ax=B & -x=A • SimStudent • simSt-divide: Ax=B • simSt-divide-1: -x=A -x=A Ax=B

  16. Production Rules for Division • Skill simSt-divide (e.g. -3x = 6) • What: • Left side (-3, -3x) • Right side (6) • When: • Left side (-3x) does not have constant term • How: • Divide both sides with the coefficient (-3) • Skill simSt-divide-1 (e.g. -x = 3) • What: • Left side (-x) • Right side (3) • When: • Left side (-x) is of the form -v • How: • Generate one (1) • Divide both sides with -1

  17. An Example without Spit in Divide Typein • Human-generated Model • divide-typein • SimStudent • simSt-divide-typein

  18. SimStudentvsSimStudent + Feature Learning • SimStudent • Needs strong operators • Constructs student models similar to human-generated model • Extended SimStudent • Only requires weak operators • Split KCs into finer grain sizes based on different parse trees • Does Extended SimStudent produce a KC model that better fits student learning data?

  19. Results • Significance Test • SimStudent outperforms the human-generated model in 4260 out of 6494 steps • p < 0.001 • SimStudent outperforms the human-generated model across 20 runs of cross validation • p < 0.001

  20. Summary • Presented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models. • Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.

  21. Future Studies • Test generality in other datasets in DataShop • Apply this proposed approach in other domains • Stoichiometry • Fraction addition

  22. Thank you!

  23. An Example in Algebra

  24. Feature Recognition asPCFG Induction • Underlying structure in the problem  Grammar • Feature  Non-terminal symbol in a grammar rule • Feature learning task  Grammar induction • Student errors Incorrect parsing

  25. Learning Problem • Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A probabilistic context free grammar (PCFG) • A non-terminal symbol in a grammar rule that represents target feature

  26. A Computational Model of Deep Feature Learning • Extended a PCFG Learning Algorithm (Li et al., 2009) • Feature Learning • Stronger Prior Knowledge: • Transfer Learning Using Prior Knowledge

  27. A Two-Step PCFG Learning Algorithm • Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion • Creates non-terminal symbols for frequently occurred sequences • E.g. – and 3, SignedNumber and Variable • Viterbi Training Phase: • Refinesrule probabilities • Occur more frequently  Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

  28. Feature Learning • Build most probable parse trees • For all observation sequences • Select a non-terminal symbol that • Matches the most training records as the target feature

  29. Transfer Learning Using Prior Knowledge • GSH Phase: • Build parse trees based on some previously acquired grammar rules • Then call the original GSH • Viterbi Training: • Add rule frequency in previous task to the current task 0.5 0.33 0.5 0.66

More Related