1 / 79

Peter Kraft pkraft@hsph.harvard Bldg 2 Rm 207 2-4271

EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 5: Gene-environment, gene-gene interaction and “pathway” analyses. Peter Kraft pkraft@hsph.harvard.edu Bldg 2 Rm 207 2-4271. Terwilliger & Weiss (2000) Nat Genet 26:151-157. DNA synthesis. dTMP. GCP2.

edda
Download Presentation

Peter Kraft pkraft@hsph.harvard Bldg 2 Rm 207 2-4271

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EPI293Design and analysis of gene association studiesWinter Term 2008Lecture 5: Gene-environment, gene-gene interaction and “pathway” analyses Peter Kraft pkraft@hsph.harvard.eduBldg 2 Rm 2072-4271

  2. Terwilliger & Weiss (2000) Nat Genet 26:151-157

  3. DNA synthesis dTMP GCP2 Dietary Folate Folate (monoglutamate) dUMP (Polyglutamate) RFC1 TYMS DHFR 10-formyl-THF Dietary Choline FTHFD FTHFSCDC1 DHF MTHFD 5-formyl-THF Cysteine GART ATIC AMT B6 CBS MTHFS B6 5-formimino-THF FTCD DHFR 5,10-methylidyne Homocysteine FTCD FTCD Methyl-Cobalamin SAHH THF Ser GART MTHFD AMT Betaine SAH SHMT Gly DNMT1 BHMT 5’10’MTHF MTR SAM B2 DMG MTHFR MTRR MAT1A Slide courtesy of Stephanie Chiuve 5’MTHF (plasma folate) Cobalamin Methionine DNA methylation

  4. Thomas (2005) CEBP 14:557

  5. Guiding Principle: KISS • “Keeping it Simple is Stupid” For every complex problem there is a simple, easy to understand, incorrect answer. --Quoted in Ulrich (2006) CEBP 15:827 • “Keep it Simple, Stupid” Looking at genes, environmental factors marginally is informative, and also is often all we can do reliably. C.f. Haldane’s “Defense of Beanbag Genetics” See letter by Ambrosone and response by Pharaoh in JNCI (2007) 99:487-9

  6. One of the important functions of beanbag genetics is to show what kinds of numerical data are needed [to test hypotheses about genetic effects]. Their collection will be expensive [Haldane cites an example where 25,000 subjects of more would be needed]. Insofar as Professor Mayr succeeds in convincing politicians and business executives who control research funding, we will not get the data. --Haldane, “Defense…” Or worse, we’ll get a lot of underpowered studies analyzed using poorly understood, ad-hoc methods, which will only contribute noise and confusion. Better to know we don’t know than to think we know when we don’t.

  7. Outline • Gene-environment interaction background • Study designs • Analytic methods (case-control data) • “Basic Table” • Testing for “interaction” • Testing for association, incorporating possible G-E interaction • Gene-gene interaction background • Analytic methods • Simple pairwise-interactions • Empirical Bayes hierarchical models • Toxicokinetic models • Combining information across multiple SNPs • Machine learning methods

  8. Outline • Gene-environment interaction background • Study designs • Analytic methods (case-control data) • “Basic Table” • Testing for “interaction” • Testing for association, incorporating possible G-E interaction • Gene-gene interaction background • Analytic methods • Simple pairwise-interactions • Empirical Bayes hierarchical models • Toxicokinetic models • Combining information across multiple SNPs • Machine learning methods

  9. “The influences of diet and diseases” might “mask” some “inborn errors of metabolism.” Furthermore, “idiosyncrasies as regards drugs” may be due to “inborn errors of metabolism.” Garrod (1902) Lancet 2:1616

  10. “There are exactly four possibilities, shown in Table 3. The enumeration is so simple that no one has ever troubled to make it.” X Y JBS Haldane (1938) Heredity and Politics

  11. xerodermapigmentosa PKU G6PDfava beans alpha-1antitrypsin sickle cell Figure from Khoury et al. (1988) AJHG 42:89

  12. 110 cases / 110 controls Roberts-Thomson (1996) Lancet 347:1372

  13. RR red meat servings 212 cases / 221 controls PHS Men 60 years or older Chen (1998) Cancer Res 58:3307

  14. DEFINITION and NOTA BENE “By interaction or effect modification we mean a variation in some measure of the effect of an exposure on disease risks across the levels of [...] a modifer. [...] “The definition of interaction depends on the measure of association used.” From Thomas (2004) Oxford University Press. Emphasis added.

  15. Absolute risks Relative risks • Supra- (sub-) multiplicative interaction • RR11 / (RR01  RR10)  1 • Supra- (sub-) additive interaction • I11 - (I01-I00) + (I10-I00)  0 • When the null model has no obvious biological interpretation, testing for interaction may not be helpful Thompson (1991) J Clin Epidemiol 44:221

  16. Risk of disease pGE = b0 + bg G + be E + bge GE Log odds of disease pGE log = 0 + g G + e E + ge GE 1-pGE Simple example 1 if exposed 0 if unexposed 1 if carrier 0 if non-carrier E G

  17. 0 Log odds of disease -3 Unexposed Exposed ge0 0.5 Carrier Risk of disease Noncarrier 0 Unexposed Exposed bge=0

  18. 0.5 Risk of disease 0 Unexposed Exposed bge0 0 Log odds of disease -3 Unexposed Exposed ge=0

  19. It can be useful to note that the relation between individual and joint [genetic and environmental] effects can take different forms, which can depend on the biologic mechanism underlying the interaction. However [...] predicting the biologic mechanism from such epidemiologic data is difficult and perhaps not productive. Botto and Khoury in: Khoury et al. (2004) Oxford University Press The standard “test for interaction” (H0: ge=0) is in fact a test for departure from a specific model of interaction (additive on the log odds ratio scale).

  20. pkraft@hsph.harvard.edu 1 2 4 5 G X Y E Unless relationship between G, E & X or X & Y is well known, model is unidentifiable—a given relationship between G, E &Y could be due to a non-additive relationship between G, E & X or a non-linear relationship between X & Y Y G=1,E=1 G=0,E=1 G=1,E=0 G=0,E=0 X After Thompson (1991)

  21. pkraft@hsph.harvard.edu “Crossover” effects are “non-removable,” i.e. monotonic transformations of scale will not eliminate the “interaction.” However, we do not have appropriate statistical methods for testing the specific null hypothesis of “no crossover interaction” 0.5 Risk of disease 0 Unexposed Exposed bge0

  22. Outline • Gene-environment interaction background • Study designs • Analytic methods (case-control data) • “Basic Table” • Testing for “interaction” • Testing for association, incorporating possible G-E interaction • Gene-gene interaction background • Analytic methods • Simple pairwise-interactions • Empirical Bayes hierarchical models • Toxicokinetic models • Combining information across multiple SNPs • Machine learning methods

  23. Study designs *Via matching on ethnicity, “genomic control”

  24. Weinberg and Umbach (2000) Am J Epidemiol

  25. Case-Only Analysis Based on genotype-exposure table in CASES Gentotypic odds ratios for exposure from this table are equal to interaction relative risks only if genotypes and exposure are not correlated in general population. (Also have to assume log-linear risk model: Pr(D|G,E)=aBGCEDG,E, where B C and D for reference genotypes or exposures are 1.) if P(G,E)=P(G)P(E)

  26. Outline • Gene-environment interaction background • Study designs • Analytic methods (case-control data) • “Basic Table” • Testing for “interaction” • Testing for association, incorporating possible G-E interaction • Gene-gene interaction background • Analytic methods • Simple pairwise-interactions • Empirical Bayes hierarchical models • Toxicokinetic models • Combining information across multiple SNPs • Machine learning methods

  27. Basic 6x2 Table Like 2x3 disease-genotype table, this presentation is “closest to the data” and makes no assumption about genetic model or how the gene and exposure jointly influence risk

  28. Testing “interaction” (standard) • Compare “main effects only” model to “main effects plus interaction” model • Usually called the “test for interaction,” this is actually a test of departure from a specified model for interaction (additive on the log odds scale for logistic regression) Say E is dichotomous 0,1 and G is also 0,1 (e.g. dominant coding) Then in SAS speak, we want to compare model caco=g e; to model caco=g e g*e; Tests: OR11/(OR10OR01)=1

  29. Testing “interaction” (a little fancier) • Often researchers are interested in departures from an additive (on the incidence scale) interaction • Somehow, this scale has become identified with “biologically independent effects,” although there are biologically realistic scenarios of “indpendent effects” that lead to a multiplicative interaction—for discussion, see Rothman & Greenland “Modern Epidemiology” and VanderWeele and Robins (2007) Epidemiology 18:329 • This scale has direct public health relevance • We can use a clever trick to test for non-additivity • I11 - (I01-I00) + (I10-I00) = 0  RR11=RR10 + RR01 - 1 • This is no longer a generalized linear model • Can’t fit using standard logistic regression software, e.g. • Have to use custom code (e.g. PROC NLMIXED)

  30. Testing “interaction” (a little fancier) procnlmixed data=twosnp; if (g eq 0) and (e eq 0) then eta=a; if (g eq 0) and (e eq 1) then eta=a+b2; if (g eq 1) and (e eq 0) then eta=a+b1; if (g eq 1) and (e eq 1) then eta=a+log(exp(b1)+exp(b2)-1); ll = caco*eta – (1-caco)*log(1+exp(eta)); model caco ~ general(ll); parms a b1 b2=0; run; Null Model(interaction constrained to be additive on risk scale) procnlmixed data=twosnp; if (g eq 0) and (e eq 0) then eta=a; if (g eq 0) and (e eq 1) then eta=a+b2; if (g eq 1) and (e eq 0) then eta=a+b1; if (g eq 1) and (e eq 1) then eta=a+b3; ll = caco*eta – (1-caco)*log(1+exp(eta)); model caco ~ general(ll); parms a b1 b2=0; run; Alternative Model(interaction not constrained) Compare -2 log Lnull +2 log Lalt to chi-square 1 d.f.

  31. Screening for stratum-specific effects • Is this gene associated with risk of disease in any exposure subgroup? • Can also ask: Is this exposure associated with risk of disease among individuals with any genotype? Compare two models pGE Null log = 0 + e E 1-pGE pGE Alternative log = 0 + e E + g G + ge GE 1-pGE

  32. "True" alternative model Assumption of G-E independence not required for validity of G-GE test—as long as E is measured accurately! Pr(G,E)=Pr(G)Pr(E) Pr(G)=qg Pr(E)=qe

  33. Power and sample size calculations The Test Statistic has an asymptotic χ2(δ) distribution, where Conditional on ascertainment scheme

  34. G N=900 pg=0.35 pe=0.30 ORe=2

  35. G-GE N=900 pg=0.35 pe=0.30 ORe=2

  36. GE N=900 pg=0.35 pe=0.30 ORe=2

  37. diff(G-GE,G) N=900 pg=0.35 pe=0.30 ORe=2

  38. diff(G-GE,G) N=1050 pg=0.35 pe=0.30 ORe=2

  39. diff(G-GE,G) N=1200 pg=0.35 pe=0.30 ORe=2

  40. diff(G-GE,G) N=1350 pg=0.35 pe=0.30 ORe=2

  41. diff(G-GE,G) N=1350 pg=0.35 pe=0.20 ORe=2

  42. diff(G-GE,G) N=1350 pg=0.35 pe=0.10 ORe=2

  43. What about misclassified E? fb,q,s,d(D,G,E)= ∑XPb(D|G,X) Ps(E|X) Pd,qe(X|G) Pqg(G) b penetrance parameters s sensitivity, specificity q exposure prevalence, allele frequency d exposure odds ratio(s) by genotype See also: Garcia-Closas, Thompson et al. 1998; Garcia-Closas, Rothman et al. 1999

  44. G-GE GE G n=1200,qg=10%,qe=25%,ORe=2

  45. n=1200,qg=10%,qe=25%,ORe=2 G-GE GE G

  46. E as an intermediate G G G D D D But if: or E E E So far we've discussed this ... Then conditioning on E can reduce or eliminate power to detect G

  47. Take home message • Which test/analysis is most appropriate will depend on goals of analysis • Are you screening for genetic (environmental) factors , allowing for possible effect modification by environmental (genetic) factors? Is E on the causal pathway from G to D? • Are you trying to describe the risk pattern across G-E strata? What scale is most relevant? E.g. departures from additivity on absolute risk scale are relevant as they provide support for targeted interventions. • It is extremely difficult to argue from observational data back to biologic mechanism… we are epidemiologists, not cell biologists

More Related