html5-img
1 / 84

Evaluation Metrics

Evaluation Metrics. February 12, 2010. A break in the usual order of things…. Today’s Probing Question will be discussed later in the class rather than at the beginning Your responses to this (those of you who responded) were the most thoughtful ones I’ve seen all semester

genna
Download Presentation

Evaluation Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation Metrics February 12, 2010

  2. A break in the usual order of things… • Today’s Probing Question will be discussed later in the class rather than at the beginning • Your responses to this (those of you who responded) were the most thoughtful ones I’ve seen all semester • You really engaged with the implications, both at an educational level and a policy level

  3. Today’s Class • Evaluation Metrics • Last Wednesday’s Probing Question • Assignments

  4. Starting from the simplest metric… • Pre-test • Post-test • Of what the student (hopefully) learned during the learning intervention

  5. Post-test • What is “SQUIRREL” in Japanese? • People named Adam not allowed to answer

  6. Why would you want to do a post-test?

  7. Why would you want to do a pre-test?

  8. Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?)

  9. Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?) • Al Corbett did not use pre-tests for some research on the LISP tutor, he just filtered participants who had ever used LISP or Scheme before, under the logic that LISP was so different from other programming paradigms that there would essentially be no overlap • What do you think?

  10. Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?) • A dangerous decision, in my opinion • Singley & Anderson (1989), and many others, find that there can be surprising and unexpected degrees of transfer

  11. Comments? Questions?

  12. How can you mess up your tests? • I’m not asking about ways to do a better test • E.g. Bransford & Schwartz would say PFL is better than a standard pre-test of knowledge • But things you could do that will result in useless data

  13. How can you mess up your tests? • Multiple choice with terrible alternatives • What is the capital of Tajikstan? • Raise your hand if you know the answer

  14. How can you mess up your tests? • Multiple choice with terrible alternatives • What is the capital of Tajikstan? • Boston • Worcester • Tokyo • Dushanbe

  15. How can you mess up your tests? • Using the same items for both pre-test and post-test for any given student “Gee, this looks familiar…”

  16. How can you mess up your tests? • Using pre-tests and post-tests of different difficulty • Pre-test: What is the capital of Tajikstan? • Post-test: What is the capital of Japan? • Look how great my geography tutor is!

  17. How can you mess up your tests? • Using pre-tests and post-tests of different difficulty • (Even worse if you put the easy items on the pre-test and the hard items on the post-test!) • The most common approach is to counter-balance the tests • Half of students: Pre-test Form A, Post-test Form B • Half of students: Pre-test Form B, Post-test Form A

  18. How can you mess up your tests? • Letting students “help” each other during the tests • Raise your hand if you’ve ever seen this

  19. How can you mess up your tests? • Letting the teacher give a student the answer during the post-test • Raise your hand if you’ve ever seen this

  20. How can you mess up your tests? • Not communicating that an online test is not a tutor • “Hey, how come this tutor doesn’t have any feedback?”

  21. Comments? Questions?

  22. Pre-Post Comparison (4 ways) • t-test on Post-test - Pre-test for each group • Advantages? Disadvantages?

  23. Pre-Post Comparison (4 ways) • t-test on Post-test – Pre-test for each group • Advantages? Disadvantages? • Vulnerable to ceiling effects 100% Test Score 0% Pre Post

  24. Pre-Post Comparison (4 ways) • t-test on (Post-test – Pre-test)/(1-Pre-test) for each group • Advantages? Disadvantages?

  25. Pre-Post Comparison (4 ways) • t-test on (Post-test – Pre-test)/(1-Pre-test) for each group • Accounts for high performers… • But has weird effects if anyone does worse on post-test than pre-test • Pre = 20%, Post = 10%, Res = -50% • Pre = 100%, Post = 90%, Res = -∞%

  26. Pre-Post Comparison (4 ways) • Regression set up as Post-test = a0 Pre-test + a1 Condition + a2 • allows you to find mean difference in conditions while controlling for each student’s pre-test score • Advantages? Disadvantages?

  27. Pre-Post Comparison (4 ways) • Regression set up as Post-test = a0 Pre-test + a1 Condition + a2 • allows you to find mean difference in conditions while controlling for each student’s pre-test score • You need to check that condition differences are not actually pre-test differences between conditions using Pre-test = a0 Condition + a1

  28. Pre-Post Comparison (4 ways) • Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control • Advantages? Disadvantages?

  29. Pre-Post Comparison (4 ways) • Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control • How big is the difference between groups?(not just how likely is it, if chance was all there was)

  30. Comments? Questions?

  31. (Some Types of)Contents of Tests • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  32. Types I believe you already know • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  33. Complete Problem-Solving Draw a scatterplot of this fake data

  34. Decomposed Problem-Solving What variables would you use to draw a scatterplot of this data?

  35. Have them turn in their answer • (Or go to the next webpage)

  36. Decomposed Problem-Solving What is a good scale for Population?

  37. Have them turn in their answer • (Or go to the next webpage)

  38. Decomposed Problem-Solving What is a good upper and lower bound for Population?

  39. Have them turn in their answer • (Or go to the next webpage)

  40. Decomposed Problem-Solving Label the axes with values(Have Population go from 0 to 700 with scale of 50, and Number of Restaurants go from 0 to 80 with scale of 10) Number ofRestaurants Population

  41. And so on…

  42. Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  43. Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  44. Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  45. Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  46. Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving

  47. “Contingent Correctness” Grading • Some researchers try to deal with the issue of partial correctness in complete problem-solving by grading contingent correctness • i.e. If step A is wrong, but step B is correct based on step A, count step B as correct • E.g. if the student used the wrong variable, but plotted the points correctly, the point plotting is contingently correct • Time-consuming and tricky to do

  48. Comments? Questions?

  49. Other measures

  50. Learning Efficiency • Perhaps two conditions have equal learning, but one condition takes significantly more time than another condition • Advantages? Disadvantages?

More Related