Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing & Other Models Zach Pardos PLSC Summer School 2011

Outline of Talk • Introduction to Knowledge Tracing • History • Intuition • Model • Demo • Variations (and other models) • Evaluations (baker work / kdd) • Random Forests • Description • Evaluations (kdd) • Time left? • Vote on next topic Bayesian Knowledge Tracing & Other Models Zach Pardos PLSC Summer School 2011

Intro to Knowledge Tracing History Introduced in 1995 (Corbett & Anderson, UMUAI) Basked on ACT-R theory of skill knowledge (Anderson 1993) Computations based on a variation of Bayesian calculations proposed in 1972 (Atkinson) Bayesian Knowledge Tracing & Other Models Zach Pardos PLSC Summer School 2011

Intro to Knowledge Tracing Intuition Based on the idea that practice on a skill leads to mastery of that skill Has four parameters used to describe student performance Relies on a KC model Tracks student knowledge over time Bayesian Knowledge Tracing & Other Models Zach Pardos PLSC Summer School 2011

Intro to Knowledge Tracing For some Skill K: Given a student’s response sequence 1 to n, predict n+1 1 …. n n+1 ? 0 1 0 1 1 0 Chronological response sequence for student Y [ 0 = Incorrect response 1 = Correct response] Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Track knowledge over time (model of learning) 1 0 1 0 1 1 0 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Knowledge Tracing (KT) can be represented as a simple HMM Latent Observed Node states K = Two state (0 or 1) Q = Two state (0 or 1) Node representationsK = Knowledge nodeQ = Question node UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip Four parameters of the KT model: P(L0) P(T) P(T) P(G) P(G) P(G) P(S) Probability of forgetting assumed to be zero (fixed) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Formulas for inference and prediction If (1) (2) (3) • Derivation (Reye, JAIED 2004): • Formulas use Bayes Theorem to make inferences about latent variable Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Model Training: Model Training Step - Values of parameters P(T), P(G), P(S) & P(L0) used to predict student responses • Ad-hoc values could be used but will likely not be the best fitting • Goal: find a set of values for the parameters that minimizes prediction error 0 1 0 1 1 0 Student A Student B Student C 1 0 1 1 1 1 0 1 0 0 0 0 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Model Prediction: Model Tracing Step – Skill: Subtraction 10% 45%75% 79% 83% P(K) Latent (knowledge) 0 1 1 71% 74% P(Q) Observable (responses) Test set questions Student’s last three responses to Subtraction questions (in the Unit) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Influence of parameter values Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1 Student reached 95% probability of knowledge After 4th opportunity P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Influence of parameter values Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1 Student reached 95% probability of knowledge After 8th opportunity P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 P(L0): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing ( Demo ) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Variations on Knowledge Tracing(and other models) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach Do all students enter a lesson with the same background knowledge? P(L0|S) Observed Node states K = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) Node representationsK = Knowledge nodeQ = Question node S = Student node Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach Conditional Probability Table of Student node and Individualized Prior node P(L0|S) CPT of Student node • CPT of observed student node is fixed • Possible to have S value for every student ID • Raises initialization issue (where do these prior values come from?) S value can represent a cluster or type of student instead of ID Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach Conditional Probability Table of Student node and Individualized Prior node P(L0|S) P(L0|S) CPT of Individualized Prior node • Individualized L0 values need to be seeded • This CPT can be fixed or the values can be learned • Fixing this CPT and seeding it with values based on a student’s first response can be an effective strategy This model, that only individualizes L0, the Prior Per Student (PPS) model Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach Conditional Probability Table of Student node and Individualized Prior node P(L0|S) P(L0|S) CPT of Individualized Prior node • Bootstrapping prior • If a student answers incorrectly on the first question, she gets a low prior • If a student answers correctly on the first question, she gets a higher prior 1 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach What values to use for the two priors? P(L0|S) P(L0|S) CPT of Individualized Prior node What values to use for the two priors? 1 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach What values to use for the two priors? P(L0|S) P(L0|S) CPT of Individualized Prior node Use ad-hoc values 1 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach What values to use for the two priors? P(L0|S) P(L0|S) CPT of Individualized Prior node Use ad-hoc values Learn the values 1 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach What values to use for the two priors? P(L0|S) P(L0|S) CPT of Individualized Prior node Use ad-hoc values Learn the values Link with the guess/slip CPT 1 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Prior Individualization Approach What values to use for the two priors? P(L0|S) P(L0|S) CPT of Individualized Prior node Use ad-hoc values Learn the values Link with the guess/slip CPT 1 1 With ASSISTments, PPS (ad-hoc) achieved an R2 of 0.301 (0.176 with KT) (Pardos & Heffernan, UMAP 2010) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Variations on Knowledge Tracing(and other models) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 1. BKT-BF Learns values for these parameters byperforming a grid search (0.01 granularity)and chooses the set of parameters with the best squared error P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip P(L0) P(T) P(T) . . . P(G) P(S) P(G) P(S) P(G) P(S) (Baker et al., 2010) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 2. BKT-EM Learns values for these parameters with Expectation Maximization (EM). Maximizes the log likelihood fit to the data P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip P(L0) P(T) P(T) . . . P(G) P(S) P(G) P(S) P(G) P(S) (Chang et al., 2006) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 3. BKT-CGS Guess and slip parameters are assessed contextually using a regression on features generated from student performance in the tutor P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip P(L0) P(T) P(T) . . . P(G) P(S) P(G) P(S) P(G) P(S) (Baker, Corbett, & Aleven, 2008) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 4. BKT-CSlip Uses the student’s averaged contextualSlip parameter learned across all incorrect actions. P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip P(L0) P(T) P(T) . . . P(G) P(S) P(G) P(S) P(G) P(S) (Baker, Corbett, & Aleven, 2008) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 5. BKT-LessData Limits students response sequence length to the most recent 15 during EM training. P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip P(L0) P(T) P(T) . . . P(G) P(S) P(G) P(S) P(G) P(S) Most recent 15 responses used (max) (Nooraiei et al, 2011) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 6. BKT-PPS Prior per student (PPS) model which individualizes the prior parameter. Students are assigned a prior based on their response to the first question. P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip P(L0) P(L0|S) P(T) P(T) . . . Observed P(G) P(S) P(G) P(S) P(G) P(S) (Pardos & Heffernan, 2010) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 7. CFAR Correct on First Attempt Rate (CFAR) calculates the student’s percent correct on the current skill up until the question being predicted. Student responses for Skill X: 0 1 0 1 0 1 _ Predicted next response would be 0.50 (Yu et al., 2010) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 8. Tabling Uses the student’s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set Training set Student A: 0 1 1 0 Student B: 0 1 1 1 Student C: 0 1 1 1 Max table length set to 3: Table size was 20+21+22+23=15 Test set student: 0 0 1 _ Predicted next response would be 0.66 (Wang et al., 2011) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing 9. PFA Performance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student’s prior failures and successes on the current skill. An overall difficulty parameter ᵝis also fit for each skill or each item In this study we use the variant of PFA that fits ᵝ for each skill. The PFA equation is: (Pavlik et al., 2009) UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Dataset MethodologyEvaluation Study • Cognitive Tutor for Genetics • 76 CMU undergraduate students • 9 Skills (no multi-skill steps) • 23,706 problem solving attempts • 11,582 problem steps in the tutor • 152 average problem steps completed per student (SD=50) • Pre and post-tests were administered with this assignment Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Methodologymodel in-tutor prediction Study • Predictions were made by the 9 models using a 5 fold cross-validation by student BKT-BF BKT-EM … Actual • Accuracy was calculated with A’ for each student. Those values were then averaged across students to report the model’s A’ (higher is better) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Resultsin-tutor model prediction Study A’ results averaged across students Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Resultsin-tutor model prediction Study A’ results averaged across students No significant differences within these BKT Significant differences between these BKT and PFA Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Methodologyensemble in-tutor prediction Study • 5 ensemble methods were used, trained with the same 5 fold cross-validation folds features label BKT-BF BKT-EM … Actual • Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label. Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Methodologyensemble in-tutor prediction Study • Ensemble methods used: • Linear regression with no feature selection (predictions bounded between {0,1}) • Linear regression with feature selection (stepwise regression) • Linear regression with only BKT-PPS & BKT-EM • Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip • Logistic regression Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Resultsin-tutor ensemble prediction Study A’ results averaged across students No significant difference between ensembles • Tabling Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Resultsin-tutor ensemble & model prediction Study A’ results averaged across students Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Intro to Knowledge Tracing Resultsin-tutor ensemble & model prediction Study A’ results calculated across all actions Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests Random Forests In the KDD Cup • Motivation for trying non KT approach: • Bayesian method only uses KC, opportunity count and student as features. Much information is left unutilized. Another machine learning method is required • Strategy: • Engineer additional features from the dataset and use Random Forests to train a model Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests • Strategy: • Create rich feature datasets that include features created from features not included in the test set Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests • Created by Leo Breiman • The method trains T number of separate decision tree classifiers (50-800) • Each decision tree selects a random 1/P portion of the available features (1/3) • The tree is grown until there are at least M observations in the leaf (1-100) • When classifying unseen data, each tree votes on the class. The popular vote wins or an average of the votes (for regression) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests Feature Importance Features extracted from training set: • Student progress features (avg. importance: 1.67) • Number of data points [today, since the start of unit] • Number of correct responses out of the last [3, 5, 10] • Zscoresum for step duration, hint requests, incorrects • Skill specific version of all these features • Percent correct features (avg. importance: 1.60) • % correct of unit, section, problem and step and total for each skill and also for each student (10 features) • Student Modeling Approach features (avg. importance: 1.32) • The predicted probability of correct for the test row • The number of data points used in training the parameters • The final EM log likelihood fit of the parameters / data points Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests • Features of the user were more important in Bridge to Algebra than Algebra • Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests Algebra Bridge to Algebra Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Random Forests Algebra Bridge to Algebra • Best Bridge to Algebra RMSE on the Leaderboard was 0.2777 • Random Forest RMSE of 0.2712 here is exceptional Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining

Presentation Transcript

Knowledge Discovery and Data Mining

Educational Data Mining Overview

Special Topics in Educational Data Mining

Educational Data Mining Overview

Special Topics in Educational Data Mining

Educational Data Mining

Bayesian Knowledge Tracing and Discovery with Models

Educational Data Mining and DataShop

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining

Knowledge Discovery in Data [and Data Mining] (KDD)

Other Data Models

Data Mining and Predictive Analytics Toolkit

Core Methods in Educational Data Mining

Educational Data Mining: Discovery with Models

Bayesian Networks for Data Mining

Knowledge Tracing

Educational Data Mining: Discovery with Models

Knowledge Discovery and Data Mining