1 / 29

Lecture 21: poisson regression log-linear regression

Lecture 21: poisson regression log-linear regression. BMTRY 701 Biostatistical Methods II. Poisson distribution. Used for count data generally, rare events in space or time upper limit is theoretically infinite Examples: earthquakes, hurricanes cancer incidence (spatial)

wenda
Download Presentation

Lecture 21: poisson regression log-linear regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II

  2. Poisson distribution • Used for count data • generally, rare events • in space or time • upper limit is theoretically infinite • Examples: • earthquakes, hurricanes • cancer incidence (spatial) • absences in school year • AIDS deaths in a region • Assessing disease in different groups: • Probability, Risk, Rate, Incidence, Prevalence

  3. The Poisson distribution • Probability mass function • Approximates a binomial for rare event • Notice it has only ONE parameter: λ • Mean = variance = λ

  4. Simple poisson distribution example • The infection rate at a Neonatal Intensive Care Unit (NICU) is typically expressed as a number of infections per patient days. This is obviously counting a number of events across both time and patients. • assume that the probability of getting an infection over a short time period is proportional to the length of the time period. In other words, a patient who stays one hour in the NICU has twice the risk of a single infection as a patient who stays 30 minutes. • assume that for a small enough interval, the probability of getting two infections is negligible. • assume that the probability of infection does not change over time or over infants. • assume independence. • The probability of seeing an infection in one child does not increase or decrease the probability of seeing an infection in another child. • If an infant gets an infection during one time interval, it doesn't change the probability that he or she will get another infection during a later time interval.

  5. Poisson regression • Based on the idea that the log of probability of disease is a linear function of risk factors • The rate ratio (“relative risk”) is modeled • Interpretation of slope:

  6. Implementation • riis the rate • Often we observe • a number of events • a geographic region, time, or number of person-years • Need to account for these differences • rates based on smaller “exposure” are less precise • adjustment is made

  7. Implementation • Unless there is uniform time, space, etc., the following is generally implemented: “OFFSET”

  8. Offset term • Notice: NO COEFFICIENT on offset • Adjusts for population size or space • Example: breast cancer incidence per county in south carolina • cases are the number of women (& men) diagnosed within in a county in SC in one year. • the offset would be the population size in the county in the year (probably estimated)

  9. Caveat • Standard poisson regression relies on poisson assumption about the variance • If events tend to occur in clusters, than there is “overdispersion” • This leads to a more general form of model: log-linear model (later)

  10. Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004). • Objective: To determine whether a multi-facted systems intervention would eliminate catheter-related bloodstream infections (CR-BSIs) • Design: prospective cohort in surgical ICU at JHU including all patients with central venous catheter in ICU. • Two ICUs • Interventions: • educating staff • creating catheter insertion cart • asking providers daily if catheters could be removed • implementing checklist to ensure adherence to guidelines • empowering nurses to stop catheter insertion if violation of guidelines was observed.

  11. Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004). • Analysis • Poisson regression • Outcome is rate of CR-BSIs • Data structure • number of infections per quarter in ICU • number of catheter days (counting every patient who has catheter at 12am each day). Patients each counted only once • indicator of control vs. intervention ICU • Intervention not implemented until 1st quarter 1999.

  12. Dataset . list +-------------------------------------------------------------+ | quarter ncase cathdays rate dataset quartern | |-------------------------------------------------------------| 1. | Qtr1-98 6 1057 5.68 1 1 | 2. | Qtr2-98 4 1018 3.93 1 2 | 3. | Qtr3-98 10 899 11.12 1 3 | 4. | Qtr4-98 8 952 8.4 1 4 | 5. | Qtr1-99 3 952 3.15 1 5 | |-------------------------------------------------------------| 6. | Qtr2-99 10 939 10.65 1 6 | 7. | Qtr3-99 5 1045 4.78 1 7 | 8. | Qtr4-99 9 927 9.71 1 8 | 9. | Qtr1-00 7 1060 6.6 1 9 | 10. | Qtr2-00 7 1094 6.4 1 10 | |-------------------------------------------------------------| 11. | Qtr3-00 5 850 5.88 1 11 | 12. | Qtr4-00 10 822 12.17 1 12 | 13. | Qtr1-01 11 868 12.67 1 13 | 14. | Qtr2-01 4 830 4.82 1 14 | 15. | Qtr3-01 4 603 6.63 1 15 | |-------------------------------------------------------------| 16. | Qtr4-01 5 551 9.07 1 16 |

  13. Observed Data

  14. R code data <- read.csv("csicu7.csv") plot(data$quartern, data$rate, xlab="Quarter", ylab="Rate of Infection per 1000 catheter days", pch=16) points(data$quartern[data$dataset==1], data$rate[data$dataset==1], pch=16, col=2) lines(data$quartern[data$dataset==0], data$rate[data$dataset==0], col=1) lines(data$quartern[data$dataset==1], data$rate[data$dataset==1], col=2) legend(12,22, c("Intervention ICU","Control ICU"), col=c(1,2), pch=c(16,16)) abline(v=5, lty=3)

  15. Estimating the Poisson regression • Want to model change in rates • However, the first 4 quarters there was no intervention. • Based on the observed data and on the data structure, what model is appropriate?

  16. Poisson regression model What is the model for • IV=0 and quarter<5? • IV=0 and quarter≥5? • IV=1 and quarter<5? • IV=1 and quarter≥5?

  17. R code ncase <- data$ncase cathdays <- data$cathdays control <- data$dataset intervention <- 1- control quartern <- data$quartern # create knot for spline model k1 <- ifelse(quartern>5,quartern-5,0) # FIT MODEL WITH INTERACTIONS WITH TIME FOR BOTH GROUPS reg <- glm(ncase~intervention*quartern+ intervention*k1, family=poisson, offset=log(cathdays)) summary(reg)

  18. Results Call: glm(formula = ncase ~ intervention * quartern + intervention * k1, family = poisson, offset = log(cathdays)) Deviance Residuals: Min 1Q Median 3Q Max -3.6005 -0.8439 -0.2368 0.6349 2.4233 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.20386 0.37944 -13.715 <2e-16 *** intervention 0.73339 0.45986 1.595 0.111 quartern 0.07517 0.09148 0.822 0.411 k1 -0.08774 0.10365 -0.847 0.397 intervention:quartern -0.02874 0.11302 -0.254 0.799 intervention:k1 -0.08355 0.13080 -0.639 0.523 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 108.489 on 39 degrees of freedom Residual deviance: 61.317 on 34 degrees of freedom AIC: 213.76

  19. Fitted model, rate scale

  20. R code fit.early.0 <- b[1] + b[3]*seq(1,5,1) fit.late.0 <- (b[1]-b[4]*5) + (b[3]+b[4])*seq(5,20,1) fit.early.1 <- (b[1]+b[2]) + (b[3]+b[5])*seq(1,5,1) fit.late.1 <- (b[1]+b[2]-b[4]*5-b[6]*5) + (b[3]+b[4]+b[5]+b[6])*seq(5,20,1) fit.early.0 rate.early.0 <- exp(fit.early.0)*1000 rate.early.0 rate.early.1 <- exp(fit.early.1)*1000 rate.late.0 <- exp(fit.late.0)*1000 rate.late.1 <- exp(fit.late.1)*1000 # add lines to plot for fitted control ICU lines(seq(1,5,1), rate.early.0, col=2) lines(seq(5,20,1), rate.late.0, col=2) # add lines to plot for fitted intervention ICU lines(seq(1,5,1), rate.early.1, col=1) lines(seq(5,20,1), rate.late.1, col=1)

  21. Fitted model, linear predictor scale

  22. Real question • Is the change in infection rates different in the two ICUs? • That is, are the slopes after Q5 different? • How to test that: • slope in control ICU: β3 + β4 • slope in intervention ICU: β3 + β4 + β5 + β6 • What is the hypothesis test?

  23. Linear Combination of Coefficients > estimable(reg, c(0,0,0,0,1,1)) Estimate Std. Error X^2 value DF Pr(>|X^2|) (0 0 0 0 1 1) -0.1122858 0.03091206 13.19452 1 0.0002807688

  24. Example: Breast Cancer Incidence in SC • Cunningham et al. • Hypothesize that there are differences in subtypes of breast cancer by race • ER + vs. ER- • Grades 1, 2, 3 • Stage 1, 2, 3, 4 • Incidence of breast cancer varies by age • Data: • Tumor registry data for SC (and Ohio) • Census data for SC

  25. Poisson modeling • Rate of incidence per cancer type • Modeled as a function of ER, grade and race > summary(reg1) Call: glm(formula = nc ~ age + age2 + age3 + bl + er + gr + age * bl + age2 * bl + age3 * bl + age * er + age2 * er + age3 * er + age * gr + age2 * gr + age3 * gr + bl * er + bl * gr + er * gr, family = poisson, offset = log(9 * popn))

  26. Results

  27. Confidence Intervals

  28. Incidence Ratio for AA vs. EA

More Related