advanced methods and analysis for the learning and social sciences n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Advanced Methods and Analysis for the Learning and Social Sciences PowerPoint Presentation
Download Presentation
Advanced Methods and Analysis for the Learning and Social Sciences

Loading in 2 Seconds...

play fullscreen
1 / 48

Advanced Methods and Analysis for the Learning and Social Sciences - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Advanced Methods and Analysis for the Learning and Social Sciences. PSY505 Spring term, 2012 February 13, 2012. Today’s Class. Classification and Behavior Detection. Prediction. Pretty much what it says A student is using a tutor right now. Is he gaming the system or not?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Advanced Methods and Analysis for the Learning and Social Sciences' - davida


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
advanced methods and analysis for the learning and social sciences

Advanced Methods and Analysis for the Learning and Social Sciences

PSY505Spring term, 2012

February 13, 2012

today s class
Today’s Class
  • Classification and Behavior Detection
prediction
Prediction
  • Pretty much what it says
  • A student is using a tutor right now.Is he gaming the system or not?
  • A student has used the tutor for the last half hour.

How likely is it that she knows the skill in the next step?

  • A student has completed three years of high school.

What will be her score on the college entrance exam?

two key types of prediction
Two Key Types of Prediction

This slide adapted from slide by Andrew W. Moore, Google http://www.cs.cmu.edu/~awm/tutorials

classification
Classification
  • There is something you want to predict (“the label”)
  • The thing you want to predict is categorical
    • The answer is one of a set of categories, not a number
    • CORRECT/WRONG (sometimes expressed as 0,1)
    • HELP REQUEST/WORKED EXAMPLE REQUEST/ATTEMPT TO SOLVE
    • WILL DROP OUT/WON’T DROP OUT
    • WILL SELECT PROBLEM A,B,C,D,E,F, or G
where do those labels come from
Where do those labels come from?
  • Field observations (take PSY503)
  • Text replays (take PSY503)
  • Post-test data (take PSY503)
  • Tutor performance
  • Survey data
  • School records
  • Where else?
classification1
Classification
  • Associated with each label are a set of “features”, which maybe you can use to predict the label

Skill pknow time totalactions right

ENTERINGGIVEN 0.704 9 1 WRONG

ENTERINGGIVEN 0.502 10 2 RIGHT

USEDIFFNUM 0.049 6 1 WRONG

ENTERINGGIVEN 0.967 7 3 RIGHT

REMOVECOEFF 0.792 16 1 WRONG

REMOVECOEFF 0.792 13 2 RIGHT

USEDIFFNUM 0.073 5 2 RIGHT

….

classification2
Classification
  • The basic idea of a classifier is to determine which features, in which combination, can predict the label

Skill pknow time totalactions right

ENTERINGGIVEN 0.704 9 1 WRONG

ENTERINGGIVEN 0.502 10 2 RIGHT

USEDIFFNUM 0.049 6 1 WRONG

ENTERINGGIVEN 0.967 7 3 RIGHT

REMOVECOEFF 0.792 16 1 WRONG

REMOVECOEFF 0.792 13 2 RIGHT

USEDIFFNUM 0.073 5 2 RIGHT

….

classification3
Classification
  • Of course, usually there are more than 4 features
  • And more than 7 actions/data points
  • These days, 800,000 student actions, and 26 features, would be a medium-sized data set
classification4
Classification
  • One way to classify is with a Decision Tree (like J48)

PKNOW

<0.5

>=0.5

TIME

TOTALACTIONS

<6s.

>=6s.

<4

>=4

RIGHT

WRONG

RIGHT

WRONG

classification5
Classification
  • One way to classify is with a Decision Tree (like J48)

PKNOW

<0.5

>=0.5

TIME

TOTALACTIONS

<6s.

>=6s.

<4

>=4

RIGHT

WRONG

RIGHT

WRONG

Skill pknow time totalactions right

COMPUTESLOPE 0.544 9 1 ?

j48 c4 5
J48/C4.5
  • Can handle both numerical and categorical predictor variables
    • Tries to find optimal split in numerical variable
  • Repeatedly looks for variable which best splits the data in terms of predictive power for each variable
  • Later prunes out branches that turned out to have low predictive power
step regression
Step Regression

Linear regression (discussed in detail in a later class), with a cut-off

Essentially assigns a weight to each parameter, and then computes a numerical value

Then all values below 0.5 are treated as 0, and all values >= 0.5 are treated as 1

and of course
And of course…
  • There are lots of other classification algorithms you can use...
  • K* (instance-based classification)
  • JRip (rule-based classification using trees)
  • PART (rule-based classification using trees)
  • Neural Network
  • Logistic Regression
  • SMO (support vector machine)
  • In your favorite Machine Learning package
if there s time at the end of class
If there’s time at the end of class…
  • We could go through some of these algorithms
what data set should you generally test on
What data set should you generally test on?
  • A vote…
    • Raise your hands as many times as you like
what data set should you generally test on1
What data set should you generally test on?
  • The data set you trained your classifier on
  • A data set from a different tutor
  • Split your data set in half (by students), train on one half, test on the other half
  • Split your data set in ten (by actions). Train on each set of 9 sets, test on the tenth. Do this ten times.
  • Votes?
what data set should you generally test on2
What data set should you generally test on?
  • The data set you trained your classifier on
  • A data set from a different tutor
  • Split your data set in half (by students), train on one half, test on the other half
  • Split your data set in ten (by actions). Train on each set of 9 sets, test on the tenth. Do this ten times.
  • What are the benefits and drawbacks of each?
the dangerous one though still sometimes ok
The dangerous one(though still sometimes OK)
  • The data set you trained your classifier on
  • If you do this, there is serious danger of over-fitting
the dangerous one though still sometimes ok1
The dangerous one(though still sometimes OK)
  • You have ten thousand data points.
  • You fit a parameter for each data point.
  • “If data point 1, RIGHT. If data point 78, WRONG…”
  • Your accuracy is 100%
  • Your kappa is 1
  • Your model will neither work on new data, nor will it tell you anything.
the dangerous one though still sometimes ok2
The dangerous one(though still sometimes OK)
  • The data set you trained your classifier on
  • When might this one still be OK?
the dangerous one though still sometimes ok3
The dangerous one(though still sometimes OK)
  • The data set you trained your classifier on
  • When might this one still be OK?
    • Computing complexity-based goodness metrics such as BiC
    • Determine maximum possible performance of modeling approach
k fold cross validation standard
K-fold cross validation (standard)
  • Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times.
  • What can you infer from this?
k fold cross validation standard1
K-fold cross validation (standard)
  • Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times.
  • What can you infer from this?
    • Your detector will work with new data from the same students
k fold cross validation standard2
K-fold cross validation (standard)
  • Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times.
  • What can you infer from this?
    • Your detector will work with new data from the same students
  • How often do we really care about this?
k fold cross validation student level
K-fold cross validation (student-level)
  • Split your data set in half (by student), train on one half, test on the other half
  • What can you infer from this?
k fold cross validation student level1
K-fold cross validation (student-level)
  • Split your data set in half (by student), train on one half, test on the other half
  • What can you infer from this?
    • Your detector will work with data from new students from the same population (whatever it was)
    • Possible to do in RapidMiner
    • Not possible to do in Weka
k fold or leave one out
K-fold or leave-one-out
  • Really not clear which one is best (as discussed in previous lecture)
  • Certain kinds of re-sampling/bootstrapping/etc. are easier to do with k-fold cross-validation
a data set from a different tutor
A data set from a different tutor
  • The most stringent test
  • When your model succeeds at this test, you know you have a good/general model
  • When it fails, it’s sometimes hard to know why
an interesting alternative
An interesting alternative
  • Leave-out-one-tutor-cross-validation (cf. Baker, Corbett, & Koedinger, 2006)
    • Train on data from 3 or more tutors
    • Test on data from a different tutor
    • (Repeat for all possible combinations)
    • Good for giving a picture of how well your model will perform in new lessons
worth noting
Worth noting
  • If you want to know if your model will work on new populations
  • Cross-validate at the population level rather than the student level
homework 3
Homework 3
  • Let’s look at some of the homework 3 solutions
  • Please comment on what’s right and wrong, what’s clever, etc.
  • We’ll look at the approaches, the goodness, the final models
homework 31
Homework 3
  • Now let’s take the best homework
  • Any other ideas for how to come up with a better model?
    • Let’s try them!
feature engineering
Feature Engineering
  • There are lots of fancy algorithms
  • But typically your detector is no better than your features
    • Features that have good construct validity are more likely to produce a good model
    • Particularly nice example of this in Sao Pedro et al. (under review)
  • In the next assignment, you’ll create your own features to try to produce a better model
assignment 4
Assignment 4
  • Let’s review Assignment 4
next class
Next Class
  • Wednesday, February 15
  • 3pm-5pm
  • AK232
  • Feature engineering and feature distillation
  • SPECIAL GUEST LECTURER: SUJITH GOWDA
  • Assignments Due: 4. Feature Engineering
bonus slides
Bonus Slides
  • If there’s time
conjunctive model pardos et al 2008
Conjunctive Model(Pardos et al., 2008)
  • The probability a student can answer an item with skills A and B is
  • P(CORR|A^B) = P(CORR|A) * P(CORR|B)
  • But how should credit or blame be assigned to the various skills?
koedinger et al s 2011 conjunctive model2
Koedinger et al.’s (2011)Conjunctive Model
  • Handles case where multiple skills apply to an item better than classical BKT
other bkt extensions
Other BKT Extensions?
  • Additional parameters?
  • Additional states?
many others
Many others
  • Compensatory Multiple Skills (Pardos et al., 2008)
  • Clustered Skills (Ritter et al., 2009)