1 / 9

LSP 121

LSP 121. Statistics That Deceive. Simpson’s Paradox. It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one

ashlyn
Download Presentation

LSP 121

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LSP 121 Statistics That Deceive

  2. Simpson’s Paradox • It is well accepted knowledge that the larger the data set, the better the results • Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one • Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets!

  3. Example: Simpson’s Paradox Baseball batting statistics for two players: How could Player A beat Player B for both halves individually, but then have a lower total season batting average?

  4. Example Continued We weren’t told how many at bats each player had: Player A’s dismal second half and Player B’s great first half had higher weights than the other two values.

  5. Another Example Average college physics grades for students in an engineering program: taken HS physics no HS physics Number of Students 50 5 Average Grade 80 70 Average college physics grades for students in a liberal arts program: taken HS physics no HS physics Number of Students 5 50 Average Grade 95 85 It appears that in both majors (Liberal Arts and Engineering), taking high school physics improves your college physics grade by 10.

  6. Example continued In order to get better results, let’s combine our datasets. In particular, let’s combine all the students that took high school physics. More precisely, let’s combine the Engineering majors that took high school physics with the LA majors that took high school physics. Likewise, combine the Engineers that did not take high school physics with LAs that did not take high school physics. But be careful! You can’t just take the average of the two averages, because each dataset has a different number of values!!

  7. Example continued Average college physics grades for students who took high school physics: # Students AvgGrades Weighted Grade Engineering 50 80 50/55*80=72.7 Lib Arts 5 95 5/55*95=8.6 Total 55 Average (72.7 + 8.6) = 81.3 Average college physics grades for students who did not take high school physics: # Students AvgGrades Weighted Grade Engineering 5 70 5/55*70=6.4 Lib Arts 50 85 50/55*85=77.3 Total 55 Average (6.4 + 77.3) = 83.7 Did the students that did not have high school physics actually do better?

  8. Same example calculated another way Average college physics grades for students who took high school physics: # Students Grades Grade Pts Engineering 50 80 4000 Lib Arts 5 95 475 Total 55 4475 Average (4000/4475*80 + 475/4475*95)81.3 Average college physics grades for students who did not take high school physics: # Students Grades Grade Pts Engineering 5 70 350 Lib Arts 50 85 4250 Total 55 4600 Average (350/4600*70 + 4250/4600*85)83.7 Did the students that did not have high school physics actually do better?

  9. The Problem • Two problems with combining the data • There was a larger percentage of one type of student in each table • The engineering students had a more rigorous physics class (e.g. “Physics for Enginners”) than the liberal arts students, thus there is a hidden variable • In fact, this ‘lurking variable’ that makes the subcategories different from one another is the most common cause of Simpson’s Paradox • Key Point: Be very careful when you combine data into a larger set

More Related