1 / 39

Stats 330: Lecture 28

Stats 330: Lecture 28. Contingency tables having 3 and 4 dimensions. Plan of the day. In today’s lecture we apply Poisson regression to the analysis of contingency tables having 3 or 4 dimensions. Topics Types of independence

Download Presentation

Stats 330: Lecture 28

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stats 330: Lecture 28 Contingency tables having 3 and 4 dimensions

  2. Plan of the day In today’s lecture we apply Poisson regression to the analysis of contingency tables having 3 or 4 dimensions. Topics • Types of independence • Connection between independence and interactions for 3 and 4 dimensional tables • Hierarchical models • Graphical models • Examples Reference: Coursebook, sections 5.3, 5.3.1

  3. Example: the Florida murder data • 326 convicted murderers in Florida, 1976-1977 • Classified by • Death penalty (n/y) • Victims race (black/white) • Defendants race (black/white)

  4. Contingency table

  5. Getting the data into R murder.df<-data.frame(expand.grid( defendant=c("b","w"),dp = c("y","n"), victim=c("b","w")), counts=c(13,1,195,19,23,39,105,265)) Note use of function expand.grid expand.grid(defendant=c("b","w"), dp = c("y","n"),victim=c("b","w")) defendant dp victim 1 b y b 2 w y b 3 b n b 4 w n b 5 b y w 6 w y w 7 b n w 8 w n w

  6. The data frame defendant dp victim counts 1 b y b 13 2 w y b 1 3 b n b 195 4 w n b 19 5 b y w 23 6 w y w 39 7 b n w 105 8 w n w 265

  7. Questions • Is death penalty independent of race? • What is the role of victim’s race? • What does “independent” mean when we have 3 factors?

  8. Types of independence • Suppose we have 3 criteria (factors) A, B and C • In a multinomial sampling context, let pijk = Pr(A=i, B=j, C=k) • Various forms of independence may be of interest. These can be expressed in terms of the probabilities pijk

  9. Marginal probabilities

  10. Marginal probabilities (ii)

  11. All three factors independent • This can be expressed as

  12. One factor independent of the other two • This can be expressed as

  13. Two factors conditionally independent, given a third This can be expressed as

  14. Parameterising tables of Poisson means with main effects and interactions • Recall (see Slides 22 and 23 of lecture 19) that in ordinary 3-way ANOVA we split up the table of means into main effects and interactions: mijk = m + ai+ bj+ gk+ (ab)ij+ (bg)jk+ (ag)ik +(abg)ijk • We can do exactly the same thing with the logs of the Poission means in a 3-way table: Log(mijk) = m + ai+ bj+ gk+ (ab)ij+ (bg)jk+ (ag)ik +(abg)ijk

  15. Parameterising tables of probabilities with main effects and interactions • Corresponding multinomial probabilities are given by the model Log(pijk / p111) = ai+ bj+ gk+ (ab)ij+ (bg)jk+ (ag)ik +(abg)ijk

  16. Parameterising tables with main effects and interactions (cont) • The interactions are related to the different forms of independence: • if all the interactions are zero, then the 3 factors are mutually independent • If the ABC and AB interactions are zero, then A and B are independent, given C • If the ABC, AB and AC interactions are zero, then A is independent of B and C • Using the connection between multinomial and Poisson sampling, we can test for the various types of independence by fitting a Poisson regression with A, B and C as explanatory variables, and testing for interactions.

  17. Summary of independence models • All 3 factors independent in multinomial model • Equivalent to all interactions zero in Poisson model • Poisson Model is count ~ A + B + C • A independent of B and C • Equivalent to all interactions between A and the others zero • Poisson Model is count ~ A + B + C + B:C or count ~ A + B*C

  18. Summary of independence models (cont) • Factors A and B conditionally independent given C • Equivalent to all interactions containing both A and B zero • Poisson Model is count ~ A + B + C + A:C + B:C • Equivalent to count ~ A*C + B*C

  19. Analysis strategy: Florida data • We will fit some models to this data and investigate the pattern of independence/ dependence between the factors. • We fit a maximal model, and then investigate suitable submodels.

  20. The analysis > maximal.glm<-glm(counts~victim*defendent*dp, family=poisson, data=murder.df) > anova(maximal.glm, test="Chisq") Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 7 774.73 victim 1 64.10 6 710.63 1.183e-15 defendent 1 0.22 5 710.42 0.64 dp 1 443.51 4 266.90 1.861e-98 victim:defendent 1 254.15 3 12.75 3.230e-57 victim:dp 1 10.83 2 1.92 9.999e-04 defendent:dp 1 1.90 1 0.02 0.17 victim:defendent:dp 1 0.02 0 -2.442e-15 0.89 > No interactions between defendants race and death penalty - Seems that model victim*dp + victim*defendant is appropriate “given the victim’s race, defendants race and death penalty are independent” ie the conditional independence model. Model also selected by stepwise, other anovas.

  21. Testing if the model is adequate > submodel.glm<-glm(counts~victim*defendant + victim*dp, family=poisson, data=murder.df) > summary(submodel.glm) Deviance Residuals: 1 2 3 4 5 6 7 8 0.0636 -0.2127 -0.0163 0.0525 1.0389 -0.7138 -0.4453 0.2860 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.5472 0.2680 9.503 < 2e-16 *** victimw 0.3635 0.3057 1.189 0.23449 defendantw -2.3418 0.2341 -10.003 < 2e-16 *** dpn 2.7269 0.2759 9.885 < 2e-16 *** victimw:defendantw 3.2068 0.2567 12.491 < 2e-16 *** victimw:dpn -0.9406 0.3081 -3.053 0.00227 ** • ---

  22. Testing if the model is adequate Null deviance: 774.7325 on 7 degrees of freedom Residual deviance: 1.9216 on 2 degrees of freedom AIC: 56.638 Number of Fisher Scoring iterations: 4 > 1-pchisq(1.9216,2) [1] 0.3825867 Model seems OK Some hint cell 8 not well fitted

  23. Example: the Copenhagen housing data • 317 apartment residents in Copenhagen were surveyed on their housing. 3 variables were measured: • sat: satisfaction with housing (Low, medium,high) • cont: amount of contact with other residents (Low,high) • infl: influence on management decisions • (Low, medium,high)

  24. Copenhagen housing data sat infl cont count Low Low Low 61 Medium Low Low 23 High Low Low 17 Low Medium Low 43 Medium Medium Low 35 High Medium Low 40 Low High Low 26 Medium High Low 18 High High Low 54 9 more lines…

  25. Copenhagen housing data infl = Low infl = Med infl = High

  26. The analysis > housing.glm<-glm(count~sat*infl*cont, family=poisson, data=housing.df) > anova(housing.glm, test]”Chisq”) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 17 166.757 sat 2 26.191 15 140.566 2.054e-06 infl 2 20.040 13 120.526 4.451e-05 cont 1 22.544 12 97.983 2.054e-06 sat:infl 4 75.577 8 22.406 1.504e-15 sat:cont 2 7.745 6 14.661 0.021 infl:cont 2 11.986 4 2.675 0.002 sat:infl:cont 4 2.675 0 9.546e-15 0.614 This time, only the 3-factor interaction is insignificant. This is the “homogeneous association model”

  27. Homogeneous association model • Association between two factors measured by sets of odds ratios

  28. Copenhagen housing OR’s infl = Low infl = Med infl = High 26*25/(18*15) 26*62/(54*15)

  29. Homogeneous association model (2) • For the homogeneous association model, the conditional odds ratios for A and B (ie using the conditional distributions of A and B given C=k) do not depend on k. That is, the pattern of association between A and B is the same for all levels of C. • Common value of the conditional AB Log OR’s are estimated by the AB interactions

  30. Estimating conditional OR for housing data • Estimated Sat-cont conditional log ORs are estimated with Sat-cont interactions in homogeneous association model > homogen.glm<-glm(count~sat*infl*cont-sat:infl:cont, family=poisson, data=housing.df) > summary(homogen.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) satMedium:contHigh 0.4157 0.1948 2.134 0.032818 * satHigh:contHigh 0.6496 0.1823 3.563 0.000367 ***

  31. Estimating conditional OR for housing data (2) > 0.4157+c(-1,1)*1.96*0.1948 [1] 0.033892 0.797508 > exp(0.4157+c(-1,1)*1.96*0.1948) [1] 1.034473 2.220002 > exp(0.4157) [1] 1.515431 > exp(0.6496) [1] 1.914775 • For med/High, est for log OR is 0.4157,std error is 0.1948 • CI for OR isexp(0.4157 +/- 1.96* 0.1 0.1948) i.e. (1.034, 2.220) • Estimated OR for high/high is exp(0.6496) = 1.1914

  32. 4 dimensional tables • Similar results apply for 4-dimensional tables • For example, models for 4 factors A, B, C and D • A, B, C and D all independent: A + B + C +D • A, B independent of C and D: A*B + C*D • D conditionally independent of C, given A and B: A*B*C + A*B*D

  33. Hierarchical models • We will assume that all models are hierarchical: if the model includes an interaction with factors A1, … Ak, then it includes all main effects and interactions that can be formed from A1, … Ak • We can represent hierarchical models by listing these “maximal interactions”

  34. Examples • 2 factor model A + B + A:B is hierarchical • Hierarchical notation: [AB] • 3 factor model A + B + C + A:B is hierarchical • Hierarchical notation: [AB][C] • 3 factor model A + B + C + A:B + A:C is hierarchical • Hierarchical notation: [AB][AC]

  35. Graphical Models A way of visualising independence patterns: • A subset of hierarchical models • Each factor represented by the vertex of an “association” graph • Two vertices connected by edges if they have a non-zero interaction • Then • A vertex not connected to any other vertex is independent of the other vertices • two vertices not directly connected are conditionally independent, given the connecting vertices

  36. A B C Examples A B • 2 factor model A + B + A:B or [AB] • 3 factor model A + B + C +AB or [AB][C] C independent of A and B • 3 factor model A + B + C + A:B + A:C +B:C or [AB][AC][BC] A B C

  37. A B C More Examples • 3 factor model A + B + C + AB + AC [AB][AC] • 4 factor model A + B + C + D + A:B + A:C +B:C [AB][AC][BC][D] A D B C

  38. A D B C Another Example 4 factor model A + B + C + D + A:B + B:C + A:D [AB][BC][AD] C and D are conditionally independent given A and B

  39. A D B C Another Example 4 factor model A + B + C + D + A:B:C + A:B:D [ABC][ABD] C and D are conditionally independent given A and B

More Related