1 / 22

Classification: Hands-on Activity

Learn how to predict gaming system using RapidMiner, build decision tree models, check model goodness, and improve cross-validation.

rubywolfe
Download Presentation

Classification: Hands-on Activity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification: Hands-on Activity

  2. Hands-on activity • Predicting gaming the system in RapidMiner

  3. Open RapidMiner • And open classifier.xml • We currently have the model set up to build a decision tree with action-level cross-validation • Click the Run button (blue triangle)

  4. Check model goodness • Go to AUC (AUC is another way of referring to A’) • Pretty good, eh? • What’s the problem?

  5. Check model goodness • This model uses data from the same student both to train and test • And more seriously…

  6. Re-run • Move W-J48 above XValidation • Disable XValidation • Right-click Xvalidation and click on Enable Operator • Run the model • Click on text view • What do you see?

  7. Check model goodness • W-J48 • J48 pruned tree ------------------ • pknow <= 0.071168 | • student = N46z59pQP58: N (10.0) | • student = N5LMy832c47: N (5.0) | • student = N668lBbaKFE: N (16.0) | • student = N6O31vedZbI: N (1.0) | • student = tB4vqSxzqo: G (10.0) | • student = N3hSu07XfGd: N (6.0) | • student = N6bJ4auIa8L: N (10.0) | …

  8. What’s wrong with this?

  9. What’s wrong with this? • It’s fitting to the student! • Definitely not a model that could be used with new students!

  10. Which features should we remove • For a model that has some hope of being generalizable • (Open the data set in Excel to take a look) • WEKA-CTA1Z04-fordev.csv on your USB flash drive

  11. Which features should we remove • For a model that has some hope of being generalizable • Num • Student • Lesson • Lspair • Skill • Cell • Leave group in, it’s a special case

  12. So… • Let’s quickly go into excel, do that, and re-save (with a new file name) • Now go back to RapidMiner • Change the file name in CSVExampleSource to your new file name • Run again

  13. So… • Anything wrong here?

  14. So… • Yup, no model. • So everything we had was over-fitting

  15. Let’s take • A more extensive data set • Specifically, one with additional distilled features • WEKA-CTA1Z04-allfeatures.csv

  16. What to do • Remove the same over-fitting features as before • Change the file name in CSVExampleSource to your new file name • Run! • What A’ do you get?

  17. But wait… • We still are using data from the same student in both the training set and test set • Replace XValidation with BatchXValidation • Right-click on XValidation, click Replace Operator, go to Validation, go to Other • If you look at ChangeAttributeRole, you’ll see that we are using “group” to set up the cross-validation level (and group refers to a pre-chosen group of students, approximately equal in number of students and number of actions) • Run! • What A’ do you get?

  18. A’ • What A’ do you get? • For me, the incorrect cross-validation did not actually make a difference… this time. • Did it make a difference for any of you? • (It does make a difference sometimes!)

  19. More things worth trying • Adding interaction features • F1 * F2… • You just need to enable a disabled operator…

  20. That takes a while! • Can you filter out the interaction parameters that are closely correlated to each other?

  21. Trying out other algorithms • What if you want to use step regression instead of J48? • Or logistic regression? • Or decision stumps? • Can you replace the W-J48 operator with these?

  22. Thanks!

More Related