550 likes | 652 Views
This workshop explores the impact and role of continuous moderators in behavioral genetics using raw data and summary statistics from Shaun Purcell's research in Boulder Twin Workshop, March 2004. Topics include heritability, GxE interactions, model fitting, and more.
E N D
Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004
MZ 1.03 0.87 0.98 DZ 0.95 0.57 1.08 Raw data VS summary statistics Zyg T1 T2 1 1.2 0.8 1 -1.3 -2.2 2 0.7 1.9 2 0.2 -0.8 .. ... ...
Raw data VS summary statistics Zyg T1 T2 1 1.2 0.8 1 -1.3 -2.2 2 0.7 1.9 2 0.2 -0.8 .. ... ...
Raw data VS summary statistics Zyg T1 T2 age 1 1.2 0.8 12.3 1 -1.3 -2.2 10.3 2 0.7 1.9 8.7 2 0.2 -0.8 14.5 .. ... ... ...
Data Mean Variance Bivariate normal distribution
Introducing Definition variables • Zygosity as a definition variable • “Rectangular” file data.raw 1 1 0.361769 -0.35641 2 1 0.888986 1.46342 3 1 0.535161 0.636073 ... 1 2 0.234099 0.0848318 2 2 -0.547252 -0.22976 3 2 -0.307926 -0.253692 ...
M, necessary for the means model H will be specified as a definition variable Optional: request individual fit statistics for each pair A single group for both MZ & DZ twins No need to specify number of pairs Points to a “REctangular” data file Zygosity is a “Definition” variable A model for the means [ twin1 | twin 2] Multiply A component by 1/H 1 x 1 matrix H represents each pair’s zygosity !Using definition variables Group1: Defines Matrices Calc NGroups=2 Begin Matrices; X Lower 1 1 free Y Lower 1 1 free Z Lower 1 1 free M full 1 1 free H Full 1 1 End Matrices; Begin Algebra; A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra; Ma X 0 Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt End Group2: MZ & DZ twin pairs Data NInput_vars=4 NObservations=0 RE file=data.raw Labels id zyg t1 t2 Select t1 t2 zyg / Definition zyg / Matrices = Group 1 Means M | M / Covariances A + C + E | (H~)@A + C _ (H~)@A + C | A + C + E / Specify H -1 End
Output from zyg.mx RE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations
Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code -1.0000 1.0000 2.0000 Number 500.0000 500.0000 500.0000 Mean 1.5000 -0.0140 0.0240 Variance 0.2500 0.5601 0.5211 Minimum 1.0000 -2.1941 -1.9823 Maximum 2.0000 2.1218 2.7670
Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by 1 1 1 -1 MATRIX M This is a FULL matrix of order 1 by 1 1 1 4 MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 1 MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 2 MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 3 Specify H -1
Output from zyg.mx Your model has 4 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> 2134.998 Degrees of freedom >>>>>>>>>>>>>>>> 996 • Fixing X to zero Your model has 3 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> 2154.626 Degrees of freedom >>>>>>>>>>>>>>>> 997
Continuous moderators • Traits often best defined continuously • Many environmental moderators also likely to be continuous in nature • Age • Gestational age • Socio-economic status • Educational level • Consumption of food / alcohol / drugs • How to test for G x E interaction in this case?
Continuous moderators Heritability • Problems? • Stratification of sample reduced sample size • Modelling proportions of variance • implicitly assumes equality of variance w.r.t moderator • Logical to assume a linear G E interaction • linearity at the level of effect, not variance • No obvious statistical test for heterogeneity 100% 0% Age (yrs) 4 6 8 10
Biometrical G E model • At a hypothetical single locus • additive genetic value a • allele frequency p • QTL variance 2p(1-p)a2 • Assuming a linear interaction • additive genetic value a + M • allele frequency p • QTL variance 2p(1-p)(a + M)2
Interaction Equivalently… 2 1 1 1 1 - M M Biometrical G E model No interaction a 0 -a M AA Aa aa
Model-fitting approach to GxE A C E A C E c a e a c e Twin 1 Twin 2
Model-fitting approach to GxE A C E A C E c a+XM e a+XM c e Twin 1 Twin 2 Continuous moderator variableM Can be coded 0 / 1 in the dichotomous case
Individual specific moderators A C E A C E c a+XM1 e a+XM2 c e Twin 1 Twin 2
E x E interactions A C E A C E c+YM1 c+YM2 a+XM1 a+XM2 e+ZM1 e+ZM2 Twin 1 Twin 2
ACE - XYZ - M A C E A C E c+YM1 c+YM2 a+XM1 a+XM2 e+ZM1 e+ZM2 m+MM1 m+MM2 Twin 1 Twin 2 M M Main effects and moderating effects statistically and conceptually distinct
Model-fitting approach to GxE C Component of variance A E Moderator variable
Turkheimer et al (2003) • 320 twin pairs recruited at birth from urban hospitals • G : additive genetic variance • E : SES • parental education, occupation, income • X : IQ • Wechsler; Verbal, Performance, Full
C E A Full scale IQ Verbal IQ Non-Verbal IQ
Standard model • Means vector • Covariance matrix
Allowing for a main effect of X • Means vector • Covariance matrix
! Basic model + main effect of a definition variable G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free M full 1 1 free ! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1 ! twin 1 moderator (definition variable) S full 1 1 ! twin 2 moderator (definition variable) End Matrices; Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H .5 Options NO_Output End
G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _ A*A' + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End
G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | H@A*A' + C*C' _ H@A*A' + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End
MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3228 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.3381 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1051 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9728 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1035 Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3123.925 Degrees of freedom >>>>>>>>>>>>>>>> 795
MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3078 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.0000 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1733 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9749 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1069 Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3138.157 Degrees of freedom >>>>>>>>>>>>>>>> 796
Continuous heterogeneity model • Means vector • Covariance matrix
! GxE - Basic model G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free T full 1 1 free ! moderator-linked A component U full 1 1 free ! moderator-linked C component V full 1 1 free ! moderator-linked E component M full 1 1 free ! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1 ! twin 1 moderator (definition variable) S full 1 1 ! twin 2 moderator (definition variable) End Matrices; Ma T 0 Ma U 0 Ma V 0 Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H .5 Options NO_Output End
G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _ (A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End
G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | H@(A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _ H@(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End
Practical 1 • The script: mod.mx • The data: f1.dat ID zygosity trait_twin_1 trait_twin_2 mod_twin_1 mod_twin_2 • Any evidence for G × E for this trait ? • i.e. does the A latent variable show heterogeneity with respect to the moderator variable • If so, in what way? • i.e. how would you interpret/describe the effect?
Practical 1 : f1.dat MZ pairs (trait) Moderator distribution DZ pairs (trait) All twin 1’s v.s. moderator
nomod.mx a 1.3078 a2 ~ 1.7 c 1.1733 c2 ~ 1.4 e 0.9749 e2 ~ 0.95 a2+c2+e2 = 4.05 i.e. % variance is 42%, 35% and 23%
Plotting VCs • For the additive genetic VC, for example • Given a, and a range of values for the moderator variable • For example, a = 0.5, = -0.2 and M ranges from -2 to +2
Other tests All made against the full model ACE-XYZ-M, -2LL = 3024.689
Confidence intervals • Easy to get CIs for individual parameters • Additionally, CIs on the moderated VCs are useful for interpretation • e.g. a 95% CI for (a+M)2, for a specific M
Define two extra vectors in Group 1 P full 1 13 O Unit 1 13 Matrix P -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 • Add a 4th group to calculate the CIs CIs Calc Matrices = Group 1 Begin Algebra; F= ( A@O + T@P ) . ( A@O + T@P ) / G= ( C@O + U@P ) . ( C@O + U@P ) / I= ( E@O + V@P ) . ( E@O + V@P ) / End Algebra; Interval @ 95 F 1 1 to F 1 13 Interval @ 95 G 1 1 to G 1 13 Interval @ 95 I 1 1 to I 1 13 End;
Calculation of CIs F= ( A@O + T@P ) . ( A@O + T@P ) / • E.g. if P were then ( A@O + T@P ) equals or or Finally, the dot-product squares all elements to give
Other considerations • Simple approach to test for heterogeneity • easily adapted, e.g. for ordinal data models • Extensions / things to watch for… • scalar v.s. qualitative heterogeneity • v. low power • the environment may show shared genetic influence with the trait • nonlinear effects in both mediation and moderation
rGE G Moderating G E Main effect E X
Turkheimer et al, 2003 IQ SES V(IQ) SES