Create Presentation
Download Presentation

Download Presentation

A Machine Learning Approach for Automatic Student Model Discovery

Download Presentation
## A Machine Learning Approach for Automatic Student Model Discovery

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**A Machine Learning Approach for Automatic Student Model**Discovery Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University**Student Model**• A set of knowledge components (KCs) • Encoded in intelligent tutors to model how students solve problems • Example: What to do next on problems like 3x=12 • A key factor behind instructional decisions in automated tutoring systems**Student Model Construction**• Traditional Methods • Structured interviews • Think-aloud protocols • Rational analysis • Previous Automated Methods • Learning factor analysis (LFA) • Proposed Approach • Use a machine-learning agent, SimStudent, to acquire knowledge • 1 production rule acquired => 1 KC in student model (Q matrix) Require expert input. Highly subjective. Within the search space of human-provided factors. Independent of human-provided factors.**A Brief Review of SimStudent**• A machine-learning agent that • acquires production rules from • examples & problem solving experience • given a set of feature predicates & functions**Production Rules**• Skill divide (e.g. -3x = 6) • What: • Left side (-3x) • Right side (6) • When: • Left side (-3x) does not have constant term => • How: • Get-coefficient (-3) of left side (-3x) • Divide both sides with the coefficient • Each production rule is associated with one KC • Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step • Original model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)**Deep Feature Learning**• Expert vs Novice (Chi et al., 1981) • Example: What’s the coefficient of -3x? • Expert uses deep functional features to reply -3 • Novice may use shallow perceptual features to reply 3 • Model deep feature learning using machine learning techniques • Integrate acquired knowledge into SimStudent learning • Remove dependence on strong operators & split KCs into finer grain sizes**Feature Recognition asPCFG Induction**• Underlying structure in the problem Grammar • Feature Non-terminal symbol in a grammar rule • Feature learning task Grammar induction • Student errors Incorrect parsing**Learning Problem**• Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A probabilistic context free grammar (PCFG) • A non-terminal symbol in a grammar rule that represents target feature**A Two-Step PCFG Learning Algorithm**• Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion • Creates non-terminal symbols for frequently occurred sequences • E.g. – and 3, SignedNumber and Variable • Viterbi Training Phase: • Refinesrule probabilities • Occur more frequently Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)**Example of Production Rules Before and After integration**• Extend the “What” Part in Production Rule • Original: • Skill divide (e.g. -3x = 6) • What: • Left side (-3x) • Right side (6) • When: • Left side (-3x) does not have constant term • => • How: • Get coefficient (-3) of left side (-3x) • Divide both sides with the coefficient (-3) • Extended: • Skill divide (e.g. -3x = 6) • What: • Left side (-3, -3x) • Right side (6) • When: • Left side (-3x) does not have constant term • => • How: • Get coefficient (-3) of left side (-3x) • Divide both sides with the coefficient (-3) • Fewer operators • Eliminate need for domain-specific operators**Original:**Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient (-3)**Experiment Method**• SimStudent vs. Human-generated model • Code real student data • 71 students used a Carnegie Learning Algebra I Tutor on equation solving • SimStudent: • Tutored by a Carnegie Learning Algebra I Tutor • Coded each step by the applicable production rule • Used human-generated coding in case of no applicable production • Human-generated model: • Coded manually based on expertise**How well two models fit with real student data**• Used Additive Factor Model (AFM) • An instance of logistic regression that • Uses each student, each KC and KC by opportunity interaction as independent variables • To predict probabilities of a student making an error on a specific step**An Example of Split in Division**• Human-generated Model • divide: Ax=B & -x=A • SimStudent • simSt-divide: Ax=B • simSt-divide-1: -x=A -x=A Ax=B**Production Rules for Division**• Skill simSt-divide (e.g. -3x = 6) • What: • Left side (-3, -3x) • Right side (6) • When: • Left side (-3x) does not have constant term • How: • Divide both sides with the coefficient (-3) • Skill simSt-divide-1 (e.g. -x = 3) • What: • Left side (-x) • Right side (3) • When: • Left side (-x) is of the form -v • How: • Generate one (1) • Divide both sides with -1**An Example without Spit in Divide Typein**• Human-generated Model • divide-typein • SimStudent • simSt-divide-typein**SimStudentvsSimStudent + Feature Learning**• SimStudent • Needs strong operators • Constructs student models similar to human-generated model • Extended SimStudent • Only requires weak operators • Split KCs into finer grain sizes based on different parse trees • Does Extended SimStudent produce a KC model that better fits student learning data?**Results**• Significance Test • SimStudent outperforms the human-generated model in 4260 out of 6494 steps • p < 0.001 • SimStudent outperforms the human-generated model across 20 runs of cross validation • p < 0.001**Summary**• Presented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models. • Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.**Future Studies**• Test generality in other datasets in DataShop • Apply this proposed approach in other domains • Stoichiometry • Fraction addition**Feature Recognition asPCFG Induction**• Underlying structure in the problem Grammar • Feature Non-terminal symbol in a grammar rule • Feature learning task Grammar induction • Student errors Incorrect parsing**Learning Problem**• Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A probabilistic context free grammar (PCFG) • A non-terminal symbol in a grammar rule that represents target feature**A Computational Model of Deep Feature Learning**• Extended a PCFG Learning Algorithm (Li et al., 2009) • Feature Learning • Stronger Prior Knowledge: • Transfer Learning Using Prior Knowledge**A Two-Step PCFG Learning Algorithm**• Greedy Structure Hypothesizer: • Hypothesizes grammar rules in a bottom-up fashion • Creates non-terminal symbols for frequently occurred sequences • E.g. – and 3, SignedNumber and Variable • Viterbi Training Phase: • Refinesrule probabilities • Occur more frequently Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)**Feature Learning**• Build most probable parse trees • For all observation sequences • Select a non-terminal symbol that • Matches the most training records as the target feature**Transfer Learning Using Prior Knowledge**• GSH Phase: • Build parse trees based on some previously acquired grammar rules • Then call the original GSH • Viterbi Training: • Add rule frequency in previous task to the current task 0.5 0.33 0.5 0.66