1 / 48

IV Analysis

IV Analysis. Stefan Walter Dept of Epidemiology and Biostatistics UCSF swalter@psg.ucsf.edu. U. Causality from IV analysis. X. Y. IV methods can consistently estimate the average causal effect of an exposure on an outcome even in the presence of unmeasured confounding !

klinem
Download Presentation

IV Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IV Analysis Stefan Walter Dept of Epidemiology and Biostatistics UCSF swalter@psg.ucsf.edu

  2. U Causality from IV analysis X Y IV methods can consistentlyestimatetheaverage causal effect of anexposureonanoutcomeeven in thepresence of unmeasuredconfounding! Instrumental Variable estimation uses theunconfoundedcomponent of thevariance (whichisdeterminedbytheinstrument Z) in theexposure X to estimatetheeffect of X onoutcome Y. Itestimatestheeffect of treatmentamongthosewhoreceivetreatmentbecause of theinstrument. … iftheinstrumentisvalid …

  3. IV Analyses: leverage (pseudo-)randomization Use pseudo-randomization as an instrument to estimate the effect of a phenotype on the outcome. Example instruments: • Randomization in an RCT • Before/after policy change, e.g., labeling rules, pharmacy rules, especially if not implemented universally • Physician preference • Distance to service provider • Any characteristic that makes patient ineligible for treatment but does not otherwise affect outcome Natural experiment (Z) Phenotype (X) Disease (Y) Unmeasured Confounders

  4. Instrumental Variable Analysis • Causal diagram representing the assumptions for genetic IV analyses to estimate the effect of BMI on anxiety. The causal diagram follows the rules for directed acyclic graphs (DAG) • 1) the genotype affects BMI; • 2) the genetic instrumental variables do not influence the outcome except via BMI; and • 3) there are no common causes of genotype and cognition. UnmeasuredConfounder Cognition BMI Gene

  5. Link to RCTs U Z – Randomization X – Treatment U – Unmeasured Confounding / Selection Y – Outcome X Y Z We value RCTs so much because we are relatively confident that randomization fulfills the assumptions for a valid IV.

  6. Estimation: Many options Relation Instrument  Outcome (ITT) βIV = = Relation Instrument  Treatment(Adherence)

  7. 2 Stage Least Squares (2SLS) • Calculatethepredictedvalues of theexposure. (ZX) eg linear regression of BMI on IV 1st Stage: E(X|Z) = phênotype=b0+b1Z+ bkOther Covariates • Use thepredictedvalue to explaintheoutcome. (X(Z)Y) eg linear regression of cognitiononpredictedBMI 2nd Stage: E(Y|E(X|Z))=g0+g1 E(X|Z) + gkOther Covariates g1 isthe IV2SLSestimate (Local Average Treatment Effect - LATE) (Angrist, Imbens, Rubin, 1993,p.19) swalter@psg.ucsf.edu

  8. Control Function Approach (Tchetgen Tchetgen and Vansteelandt, 2013) swalter@psg.ucsf.edu

  9. Practice Session with simulated data • GeneratetheUniverse: z1<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z2<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z3<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) e1<-rnorm(10000, sd=3) e2<-rnorm(10000, sd=3) U<-rnorm(10000, sd=1) A<-27+0.6*z1+0.3*z2+0.1*z3+2*U+e1 Y<-30+1.5*A+4*U+e2 swalter@psg.ucsf.edu

  10. 2SLS  Y<-30+1.5*A+4*U+e2 summary(lm(A~z1)) #beta = 0.6 summary(lm(A~z2)) #beta = 0.3 summary(lm(A~z3)) #beta = 0.1 summary(lm(A~z1+z2+z3)) summary(lm(Y~A)) #beta = 2.2 #twostep pred1<-predict(lm(A~z1)) summary(lm(Y~pred1)) #beta = 1.5 #Control Function IV res1<-summary(lm(A~z1))$residual summary(lm(Y~A+res1)) #beta = 1.5 swalter@psg.ucsf.edu

  11. 2 Stage Least Squares (2SLS): in SAS, Stata, R • Procsyslin • Ivreg2 • tsls (Local Average Treatment Effect - LATE) (Angrist, Imbens, Rubin, 1993,p.19) swalter@psg.ucsf.edu

  12. IV assumptions • Assumption IV.1: • Z and exposure A are associated • Z has a causal effect on A • Z and A share common causes • Assumption IV.2: • Z affects the outcome Y only through A. • no direct effect of Z on Y (“exclusion restriction”) • Assumption IV.3: • Z does not share common causes with the outcome Y, orallcommon causes controlled • no confounding for the effect of Z on Y . • Assumption IV.4: Most popular option: There are no defiers • This assumption is sometime described as a “monotonicity assumption” • no individual in the population who would be exposed, i.e. A = 1 under Z = 0, but would be unexposedunder Z = 1. • In an RCT, this would be a person who would do the exact opposite of what he/she is told to. swalter@psg.ucsf.edu

  13. The IV estimate is not necessarily the population average causal effect. Whose Causal Effect is it? Classify people based on their treatment under either value of the IV/randomly assigned treatment. What the person will do if assigned to Experimental Treatment B: Take B Take A What the person will do if assigned to Control Treatment A: Take A Take B

  14. Whose Causal Effect? What the person will do if assigned to experimental treatment: Classify people based on their treatment under either value of the IV/randomly assigned treatment. Never-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental Take control Take control What the person will do if assigned to control treatment: Always-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental

  15. Overview • IV analysis with outcome • IV analysis in case-control studies • IV analysis with survival outcomes • IV analysis in R swalter@psg.ucsf.edu

  16. IV Analysis with binary outcome • Traditionally: use 2SLS with a linear probability model • Problem: no restriction on the space of a valid probability (0<=P<=1) • … might not be a problem when using genetic variants as instruments given that they explain so Little that the estimate will hardly ever be out off bounds • Solution: use a link function: log, logit, probit swalter@psg.ucsf.edu

  17. IV Analysis with binary outcome IV Analysis with logit link swalter@psg.ucsf.edu

  18. IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu

  19. Two Sample IV designs • Using published data only: • The effect of BMI on Late Onset Alzheimer´s Disease • The effect of Type 2 Diabetes on Late Onset Alzheimer´s Disease

  20. Inverse Variance Weighted IV of separate samples (Burgess et al. 2013) Burgess, Stephen, Adam Butterworth, and Simon G. Thompson. "Mendelian randomization analysis with multiple genetic variants using summarized data." Genetic epidemiology 37.7 (2013): 658-665. Geneticvariantk, k = 1, . . . , K is associated with an observed Xkmean change in the risk factor per additional variant allele with standard error σXkand an observed Ykmean change in the outcome per allele with standard error σYk swalter@psg.ucsf.edu

  21. Inverse Variance Weighted IV: Effect of BMI on Dementia swalter@psg.ucsf.edu

  22. The Model Dementia Related Phenotypes

  23. BMI on Dementia Mukherjee, Shubhabrata, et al. "Genetically predicted body mass index and Alzheimer's disease–related phenotypes in three large samples: Mendelian randomization analyses." Alzheimer's & Dementia 11.12 (2015): 1439-1451.

  24. Split Sample IV mrozb<-as.data.frame(cbind(Y,A,U, z1,z2,z3)) pred1<-predict(lm(A~z1)) summary(lm(Y~pred1)) #beta = 1.5 mrozvs<-mrozb[sample(1:10000, 500),] a<-coef(lm(A~z1+z2+z3, data=mrozb)) mrozvs$GRS<-apply(sweep(mrozvs[c("z1", "z2","z3")],MARGIN=2,c(a[2:4]),`*`),1,function(x) sum(x, na.rm=T)) summary(lm(Y~GRS, data=mrozvs)) #beta = 0.77 swalter@psg.ucsf.edu

  25. Inverse Variance Weighted IV (external data) ### BurgessApproach coeftest(lm(A~z1)) 0.516322 0.046800 coeftest(lm(A~z2)) 0.342791 0.046807 coeftest(lm(Y~z1)) 0.84012 0.11495 coeftest(lm(Y~z2)) 0.66891 0.11469 Xk<-c(0.516,0.343) Xkse<-c(0.047,0.047) Yk<-c(0.840,0.669) Ykse<-c(0.115,0.115) sum(Xk*Yk*Ykse^-2)/sum(Xk^2*Ykse^-2) #InverseVarianceWeighted IV (1/sum(Xk^2*Ykse^-2))^0.5 swalter@psg.ucsf.edu

  26. Inverse Variance Weighted IV (external data) library(meta) sum(Xk[1]*Yk[1]*Ykse[1]^-2)/sum(Xk[1]^2*Ykse[1]^-2) #1.626 (1/sum(Xk[1]^2*Ykse[1]^-2))^0.5 #0.223 sum(Xk[2]*Yk[2]*Ykse[2]^-2)/sum(Xk[2]^2*Ykse[2]^-2) #1.950 (1/sum(Xk[2]^2*Ykse[2]^-2))^0.5 #0.0.335 metagen(c(1.626,1.950),c(0.223, 0.335)) #identical --> no heterogeneity, Instrument OK swalter@psg.ucsf.edu

  27. X Y Z U Doubting Instruments: major biases similar to critiques of RCTs • Do they have other pathways to the outcome? • Unblinded trials: controls become demoralized • Is there a common cause of the instrument and the outcome? • Trials: unfair random assignment • Do they actually affect anyone’s exposure? • Trials: nobody adheres to assignment U2 X Y Z G X Y U U X Y Z U

  28. Evaluating the assumptions • Constraints implied by theory • Over-identification tests • Stratification-based tests (similar to over-identification) • IV inequality constraints • Negative controls • Independent from known confounders • Egger tests

  29. The End …. • 

  30. What to do with a binary exposure? • In genetic IV, convert the binary exposure to the probability scale by reweighting the predicted probability from a first stage model swalter@psg.ucsf.edu

  31. Two Stage Least Squares swalter@psg.ucsf.edu

  32. IV Analysis with binary outcome IV Analysis with log link swalter@psg.ucsf.edu

  33. IV Analysis with binary outcome IV Analysis with log link swalter@psg.ucsf.edu

  34. IV Analysis with binary outcome IV Analysis with logit link swalter@psg.ucsf.edu

  35. IV Analysis with binary outcome swalter@psg.ucsf.edu

  36. IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu

  37. IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu

  38. IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu

  39. IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu

  40. IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu

  41. IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu

  42. IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu

  43. Practice Session with simulated data • GeneratetheUniverse: z1<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z2<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z3<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) e1<-rnorm(10000, sd=3) e2<-rnorm(10000, sd=3) U<-rnorm(10000, sd=1) A<-27+0.6*z1+0.3*z2+0.1*z3+2*U+e1 Y<-30+1.5*A+4*U+e2 swalter@psg.ucsf.edu

  44. 2SLS summary(lm(A~z1)) #beta = 0.6 summary(lm(A~z2)) #beta = 0.3 summary(lm(A~z3)) #beta = 0.1 summary(lm(A~z1+z2+z3)) #beta = 0.7 summary(lm(Y~A)) #beta = 2.2 #twostep pred1<-predict(lm(A~z1)) summary(lm(Y~pred1)) #beta = 1.5 #Control Function IV res1<-summary(lm(A~z1))$residual summary(lm(Y~A+res1)) #beta = 1.5 swalter@psg.ucsf.edu

  45. 2SLS #ivreg 2 script http://diffuseprior.wordpress.com/tag/over-identification/ #ivreg2(form,endog,iv,data,digits) mroz<-as.data.frame(cbind(Y,A,z1,z2,z3)) ivreg2(form=Y ~ A ,endog="A",iv=c("z1","z2","z3"),data=mroz) mrozs$res1<-summary(lm(A~z1,data=mrozb))$residual summary(lm(Y~A+res1, data=mrozs)) #beta = 1.5 swalter@psg.ucsf.edu

  46. With another assumption can say: Whose Causal Effect? Classify people based on their treatment under either value of the IV/randomly assigned treatment. What the person will do if assigned to Experimental Treatment B: Take B Take A What the person will do if assigned to Control Treatment A: Take A Take B

  47. Whose Causal Effect? What the person will do if assigned to experimental treatment: Classify people based on their treatment under either value of the IV/randomly assigned treatment. Never-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental Take control Take control What the person will do if assigned to control treatment: Always-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental

  48. Egger Regression: treat each IV estimate as an element in a meta-analysis • Under the InSIDE (instrument strength independent of direct effects) assumption, bias converges to zero. • Regress the Z-Y associations on the Z-X associations. The intercept is an estimate of average pleiotropy and the slope is an estimate of the true causal effect under InSIDE. Bowden, Davey Smith, and Burgess, Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression, IJE 2015

More Related