1 / 19

Lecture 9 Chapter 22. Tests for two-way tables

Lecture 9 Chapter 22. Tests for two-way tables. Objectives. The chi-square test for two-way tables (Award: NHST Test for Independence ) Two-way tables Hypotheses for the chi-square test for two-way tables Expected counts in a two-way table Conditions for the chi-square test

todd-riggs
Download Presentation

Lecture 9 Chapter 22. Tests for two-way tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 9Chapter 22. Tests for two-way tables

  2. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence) • Two-way tables • Hypotheses for the chi-square test for two-way tables • Expected counts in a two-way table • Conditions for the chi-square test • Chi-square test for two-way tables of fit • Simpson’s paradox

  3. Two-way tables Second factor: Student smoking status First factor: Parent smoking status 400 1380 416 1823 188 1168 An experiment has a two-way factorial design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a factorial design design or two-way observational study. High school students were asked whether they smoke, and whether their parents smoke:

  4. Marginal distribution The marginal distributions (in the “margins” of the table) summarize each factor independently. Marginal distribution for parental smoking: P(both parent) = 1780/5375 = 33.1% P(one parent) = 41.7% P(neither parent) = 25.2% 400 1380 416 1823 188 1168

  5. Conditional distribution The cells of the two-way table represent the intersection of a given level of one factor with a given level of the other factor. They represent the conditional distributions. 400 1380 416 1823 188 1168 Conditional distribution of student smoking for different parental smoking statuses: P(student smokes | both parent) = 400/1780 = 22.5% P(student smokes | one parent) = 416/2239 =18.6% P(student smokes | neither parent) = 188/1356 = 13.9%

  6. Hypotheses A two-way table has r rows and c columns. H0 states that there is no association between the row and column variables in the table. Statistical Hypotheses H0 : There is no association between the row and column variables Ha :There is an association/relationship between the 2 variables We will compare actual counts from the sample data with the counts we would expectif the null hypothesis of no relationship were true.

  7. Expected counts in a two-way table A two-way table has r rows and c columns. H0 states that there is no association between the row and column variables (factors) in the table. The expected count in any cell of a two-way table when H0 is true is: The expected count is the average count you would get for that cell if the null hypotheses was true.

  8. Cocaine produces short-term feelings of physical and mental well being. To maintain the effect, the drug may have to be taken more frequently and at higher doses. After stopping use, users will feel tired, sleepy and depressed.  Cocaine addiction A study compares the rates of successful rehabilitation for cocaine addicts following 1 of 3 treatment options: 1: antidepressant treatment (desipramine) 2: standard treatment (lithium) 3: placebo (“sugar pill”)

  9. Cocaine addiction Calculate the expected cell counts if relapse is independent of the treatment.

  10. Observed % Expected % 35% 35% 35% Expected relapse counts No Yes Desipramine Lithium Placebo

  11. Situations appropriate for the chi-square test The chi-square test for two-way tables looks for evidence of association between multiple categorical variables (factors) in sample data. The samples can be drawn either: • By randomly selecting SRSs from different populations (or from a population subjected to different treatments) • girls vaccinated for HPV or not, among 8th graders and 12th graders • remission or no remission for different treatments • Or by taking 1 SRS and classifying the individuals according to 2 categorical variables (factors) • 11th graders’ smoking status and parents’ status When looking for associations between two categorical/nominal variables.

  12. We can safely use the chi-square test when: • no more than 20% of expected counts are less than 5 (< 5) • all individualexpected counts are 1 or more (≥1) What goes wrong? With small expected cell counts the sampling distributionwill not be chi-square distributed. Statistician’s note: If one factor has many levels and too many expected counts are too low, you might be able to “collapse” some of the levels (regroup them) and thus have large-enough expected counts.

  13. The chi-square test for two-way tables H0 : there is no association between the row and column variables Ha :H0is not true The c2 statistic sums over all rxc cells in the table When H0 is true, the c2statistic follows ~ c2 distribution with (r-1)(c-1) degrees of freedom. P-value: P(c2 variable ≥ calculated c2 | H0 is true)

  14. Table A Ex: df = 6 If c2 = 15.9 the P-value is between 0.01 −0.02.

  15. No relapse Relapse Table of counts: “actual/expected,” with three rows and two columns: df = (3 − 1)(2 − 1) = 2 Desipramine Lithium Placebo We compute the X2 statistic: Using Table D: 10.60 < X2 < 11.98  0.005 > P > 0.0025 The P-value is very small (JMP gives P = 0.0047) and we reject H0.  There is a significant relationship between treatment type (desipramine, lithium, placebo) and outcome (relapse or not).

  16. Interpreting the X2 output When the X2test is statistically significant: The largestcomponents indicate which condition(s) are most different from H0. You can also compare the observed and expected counts, or compare the computed proportions in a graph. No relapse Relapse Desipramine Lithium Placebo c2 components The largest X2component, 4.41, is for desipramine/norelapse. Desipramine has the highest success rate (see graph).

  17. Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students’ smoking habits). What does it tell you? Is the sample size sufficient? What are the hypotheses? Are the data ok for a c2 test? What else should you ask? What is your interpretation?

  18. Caution with categorical data An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's paradox. Kidney stones A study compared the success rates of two different procedures for removing kidney stones: open surgery and percutaneous nephrolithotomy (PCNL), a minimally invasive technique. 273 289 77 61 22% 17% It turns out that for any given patient that PCNL is more likely to result in failure. Can you think of a reason why?

  19. The procedures are not chosen randomly by surgeons! In fact, the minimally invasive procedure is most likely used for smaller stones (with a good chance of success) whereas open surgery is likely used for more problematic conditions. 273 289 77 61 22% 17% For both small stones and large stones, open surgery has a lower failure rate. This is Simpson’s paradox. The more challenging cases with large stones tend to be treated more often with open surgery, making it appear as if the procedure were less reliable overall. Beware of lurking variables!

More Related