1 / 62

Introduction to Biostatistics (ZJU 2008)

Introduction to Biostatistics (ZJU 2008). Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: fuw@msu.edu www: http://www.msu.edu/~fuw. Categorical Data Analysis.

Download Presentation

Introduction to Biostatistics (ZJU 2008)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: fuw@msu.edu www: http://www.msu.edu/~fuw

  2. Categorical Data Analysis • Examples of categorical data. Qualitative Random Variables Yield Responses That Can Be Put In Categories. Example: Gender (Male, Female), disease (+, -). • Measurement or Count Reflect # in Category • Nominal (no order) or Ordinal Scale (order) • Data can be collected as continuous but recoded to categorical data. Example (Systolic Blood Pressure - Hypotension, Normal tension, hypertension ) • Counts of episodes of incidence.

  3. Categorical Data Analysis • Responses are not continuous, cannot use linear regression model or ANOVA model. • Not even to use transformations to make response variable meaningful under new scale. • Need to introduce new methods to analyze data.

  4. Test for Two Proportions

  5. Z Test for Two Proportions 1. Assumptions • Populations Are Independent • Populations Follow Binomial Distribution • Normal Approximation Can Be Used for large samples (All Expected Counts  5) • Z-Test Statistic for Two Proportions

  6. Sample Distribution for Difference Between Proportions

  7. Z Test for Two Proportions Thinking Challenge MA • You’re an epidemiologist for the US Department of Health and Human Services. You’re studying the prevalence of disease X in two states (MA and CA). In MA, 74 of 1500people surveyed were diseased and in CA, 129 of 1500 were diseased. At .05 level, does MA have a lower prevalence rate? CA

  8. Z Test for Two Proportions Solution* H0: pMA - pCA = 0 Ha: pMA - pCA < 0 = .05 nMA= 1500 nCA= 1500 Critical Value(s): Test Statistic: Decision: Conclusion:

  9. Z Test for Two Proportions Solution*

  10. Z Test for Two Proportions Solution* H0: pMA - pCA = 0 Ha: pMA - pCA < 0 = .05 nMA= 1500 nCA= 1500 Critical Value(s): Test Statistic: Decision: Conclusion: Z = -4.00

  11. Z Test for Two Proportions Solution* H0: pMA - pCA = 0 Ha: pMA - pCA < 0 = .05 nMA= 1500 nCA= 1500 Critical Value(s): Test Statistic: Decision: Conclusion: Z = -4.00 Reject at  = .05

  12. Z Test for Two Proportions Solution* H0: pMA - pCA = 0 Ha: pMA - pCA < 0 = .05 nMA= 1500 nCA= 1500 Critical Value(s): Test Statistic: Decision: Conclusion: Z = -4.00 Reject at  = .05 There is evidence MA is less than CA

  13. Chi-Square (2) Test for k Proportions 1. Tests Equality (=) of Proportions Only Example: p1 = .2, p2=.3, p3 = .5 2. One Variable With Several Levels 3. Assumptions Multinomial Experiment Large Sample Size (All Expected Counts  5) 4. Uses One-Way Contingency Table

  14. Multinomial Experiment • n Identical Trials • k Outcomes to Each Trial • Constant Outcome Probability, p1 ,…, pk • Independent Trials • Random Variable is Count, n1 ,…, nk • Example: In health services research, ask 100 workers Which of 3 health insurance plans (k=3) they prefer

  15. One-Way Contingency Table 1. Shows # Observations in k Independent Groups (Outcomes or Variable Levels) Outcomes (k = 3) Number of responses

  16. 2 Test for k Proportions Hypotheses & Statistic 1. Hypotheses H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0 Ha: Not all pi are equal 2. Test Statistic 3. Degrees of Freedom: k - 1 Hypothesized probability Observed count Expected count Number of outcomes

  17. 2 Test Basic Idea 1. Compares Observed Count to Expected Count under the Null Hypothesis 2. The Closer Observed Count to Expected Count, the More Likely the H0 Is True • Measured by Squared Difference Relative to Expected Count • Reject Large Values

  18. Finding Critical Value Example What is the critical 2 value ifk = 3, &  =.05? If ni = E(ni), 2 = 0. Do not reject H0  = .05 df = k - 1 = 2 2 Table (Portion)

  19. 2 Test for k Proportions Example • As an MD epidemiologist, you want to test the difference in prevention measures for patients with CVD diseases Hx. Of 180 patients, 63 were following strict prevention measures regularly (taking beta-blockers, aspirin, etc…), 45 were taking some prevention measures but not on a regular basis and 72 were not following any preventions. At the .05level, is there a differencein prevention measures?

  20. 2 Test for k Proportions Solution H0: p1 = p2 = p3 = 1/3 Ha: At least 1 is different  = .05 n1 = 63 n2 = 45 n3 = 72 Critical Value(s): Test Statistic: Decision: Conclusion:  = .05

  21. 2 Test for k Proportions Solution

  22. 2 Test for k Proportions Solution H0: p1 = p2 = p3 = 1/3 Ha: At least 1 is different  = .05 n1 = 63 n2 = 45 n3 = 72 Critical Value(s): Test Statistic: Decision: Conclusion: 2 = 6.3 Reject at  = .05  = .05 There is evidence of a difference in proportions

  23. 2 Test for k ProportionsSAS Codes Data CVD; input method count; datalines; 1 63 2 45 3 72 ; run; procfreq data=CVD order=data; weight Count; tables method/nocum testp=(0.3330.3330.333); run;

  24. 2 Test for k ProportionsSAS Output The FREQ Procedure Test method Frequency Percent Percent --------------------------------------------------------- 1 63 35.00 33.30 2 45 25.00 33.30 3 72 40.00 33.30 Chi-Square Test for Specified Proportions -------------------------------------------- Chi-Square 6.3065 DF 2 Pr > ChiSq 0.0427 Sample Size = 180

  25. R program for Chisq.test • chisq.test(c(63,45,72),p=c(1/3,1/3,1/3)) Chi-squared test for given probabilities data: c(63, 45, 72) • X-squared = 6.3, df = 2, p-value = 0.04285

  26. Relation to the Z test for comparison of two proportions? • When k=2, only two groups are to be compared, the Z test and the 2 Test for 2 Proportions are equivalent tests.(p1=p2=1/2) • Notice that when k=2, the 2 Test for 2 Proportions will have 1df • In distribution, 2 = Z2

  27. 2 Test of Homogeneity 1. Shows If a Relationship Exists Between 2 Qualitative Variables, but does Not Show Causality (no specification of dependency) 2. Assumptions Multinomial Experiment All Expected Counts  5 3. Uses Two-Way Contingency Table

  28. 2 Test of Homogeneity Contingency Table 1. Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables Levels of variable 2 Levels of variable 1

  29. 2 Test of Homogeneity Hypotheses & Statistic 1. Hypotheses H0: Homogeneity or equal proportions Ha: Heterogeneity or unequal proportions 2. Test Statistic Degrees of Freedom: (r - 1)(c - 1) Observed count Expected count Rows Columns

  30. 2 Test of Homogeneity Expected Counts 1. Statistical Independence Means Joint Probability Equals Product of Marginal Probabilities 2. Compute Marginal Probabilities & Multiply for Joint Probability 3. Expected Count Is Sample Size Times Joint Probability

  31. Expected Count Example 112 160 Marginal probability =

  32. Expected Count Example 112 160 Marginal probability = 78 160 Marginal probability =

  33. 112 160 78 160 112 160 78 160 Expected count = 160· Expected Count Example 112 160 Joint probability = Marginal probability = 78 160 Marginal probability = = 54.6

  34. Expected Count Calculation 112x78 160 112x82 160 48x78 160 48x82 160

  35. 2 Test of Homogeneity Example on HIV • You randomly sample 286 sexually active individuals and collect information on their HIV status and History of STDs. At the .05 level, is there evidence of a relationship?

  36. 2 Test of Homogeneity Solution H0: No Relationship Ha: Relationship  = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision: Conclusion:  = .05

  37. 2 Test of Homogeneity Solution  E(nij) 5 in all cells 116x132 286 154x132 286 170x132 286 170x154 286

  38. 2 Test of Homogeneity Solution

  39. 2 Test of Homogeneity Solution H0: No Relationship Ha: Relationship  = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision: Conclusion: 2 = 54.29 Reject at  = .05  = .05

  40. 2 Test of Homogeneity SAS CODES Data dis; input STDs HIV count; cards; 1 1 84 1 2 32 2 1 48 2 2 122 ; run; Procfreq data=dis order=data; weight Count; tables STDs*HIV/chisq; run;

  41. 2 Test of Homogeneity SAS OUTPUT Statistics for Table of STDs by HIV Statistic DF Value Prob ------------------------------------------------------- Chi-Square 1 54.1502 <.0001 Likelihood Ratio Chi-Square 1 55.7826 <.0001 Continuity Adj. Chi-Square 1 52.3871 <.0001 Mantel-Haenszel Chi-Square 1 53.9609 <.0001 Phi Coefficient 0.4351 Contingency Coefficient 0.3990 Cramer's V 0.4351 Continuity Correction ∑i (|Oi-Ei|-0.5)2 / Ei

  42. Fisher’s Exact Test • Fisher’s Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and the expected values are small. The test holds the marginal totals fixed and computes the hypergeometric probability that n11 is at least as large as the observed value

  43. Fisher’s Exact Test Example HIV Infection • Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% level. Hx of STDs

  44. Fisher’s Exact Test SAS Codes Data dis; input STDs $ HIV $ count; cards; no no 10 No Yes 5 yes no 7 yes yes 3 ; run; proc freq data=dis order=data; weight Count; tables STDs*HIV/chisq; run;

  45. Fisher’s Exact Test SAS Output Statistics for Table of STDs by HIV Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350 WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263 Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000

  46. Fisher’s Exact Test • The output consists of three p-values and one table probability. • Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right. • Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right. • 2-Tail: Use this when there is no prior alternative. • Table probability – the probability for this table to occur – not a tail probability or not a p-value.

  47. McNemar’s Test for Correlated (Dependent) Proportions Basis / Rationale for the Test • The approximate test previously presented for assessing a difference in proportions is based upon the assumption that the two samples are independent. • Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; and on a post-test, we find that 20% have flu.

  48. McNemar’s Test for Correlated (Dependent) Proportions • We might be tempted to suppose that no hypothesis test is required under these conditions, in that the ‘Before’ and ‘After’ probabilities are identical, and would surely result in a test statistic value of 0.00. • The problem with this thinking, however, is that the two sample p values are dependent, in that each person was assessed twice. It is possible that the 20 people that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test were a completely different set of 20 people!

  49. McNemar’s Test for Correlated (Dependent) Proportions • It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) Proportions is applicable. • McNemar’s Test employs two unique features for testing the two proportions: * a special fourfold contingency table; with a * special-purpose chi-square (2) test statistic (the approximate test).

  50. McNemar’s Test for Correlated (Dependent) Proportions Sample Problem A randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.

More Related