Values & means: summary (Falconer & Mackay: chapter 7)

Outline: 1) Values & means: summary2) Variance3) Epigenetics (profSlagboom LUMC)4) Assignments chapter 7

Values & means: summary(Falconer & Mackay: chapter 7) Sanja Franic VU University Amsterdam 2011

Summary table: P = G + E = A + D + I+ E

Variance(Falconer & Mackay: chapter 8)

Intro: Questions formulated within the context of genetic study of quantitative traits pertain predominantly to variation, the basic idea being the decomposition of phenotypic variance into components attributable to different causes. The relative magnitude of these components determines the genetic properties of a population, in particular the degree of resemblance between relatives. Here we consider the nature of these components, and how the genetic components depend on the gene frequency.

The basic idea in the study of variation of quantitative traits is the decomposition of phenotypic variance into components attributable to different causes. These components correspond to the components of value described in the last lecture, so that e.g. the genotypic variance is the variance of the genotypic values. Assuming that the genotypic values and the environmental deviations are not correlated and do not interact, the variance decomposition is as shown below: Partitioning of variance allows one to estimate the relative importance of the various determinants of individual differences in the phenotype (e.g., heredity vs. environment). The relative importance of a source of variation is expressed as variance due to that source as a proportion of the total phenotypic variance. For instance, the ratio of environmental variance to the total phenotypic variance (VE/VP) quantifies the relative importance of the environment in determining individual differences in phenotypic values. *the more general expression is VP = VG + VE + 2covGE + VGE, where covGE is the covariance between genotypic values and the environmental deviations, and VGE the variance due to interaction between genotypes and the environment.

Heritability is the relative importance of heredity in determining individual differences in the phenotype. VG/VP = broad sense heritability (i.e., degree of genetic termination) (the extent to which individuals’ phenotypes are determined by their genotypes) VA/VP = narrow sense heritability (the extent to which individuals’ phenotypes are determined by genes transmitted from the parents); the phenotypic variance explained by the variance of breeding values Estimating broad sense heritability Neither VG nor VE can be estimated from observations on a single population. The exception are experimental populations in which either VG or VE are eliminated. VE is difficult to eliminate as it includes all non-genetic variation. VG may be eliminated, e.g. by using highly inbred lines or the F1 cross between two such lines, or cloning, to produce individuals with identical genotypes. If such individuals are raised under the normal range of environmental circumstances, their VP is an estimate of VE (as here VP = 0 + VE). Subtraction of this VP from the VP of a genetically heterogeneous population gives an estimate of VG. Genetics studies are frequently centered around estimation of VA, as its magnitude is a important determinant of resemblance between relatives. Estimation of VA is dealt with in the next lecture; here, we show how VA depends on the gene frequencies and the genotypic values a and d. We illustrate using a single diallelic locus.

VA and VD can be obtained by squaring the breeding values and dominance deviations, respectively, multiplying by the genotype frequency, and summing over the three genotypes. * Note: As the means of A, G, and D are all 0, no correction for the mean is needed and their variance is obtained simply as the mean of squared values (i.e., in V = Σ(xi- μx)2/N, μx = 0, thus V = Σxi2/N). Given that we work with frequencies, V = Σxi2fi

VA =

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2 VD =

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2 VD = 4q4d2p2 + 8p3q3d2 + 4p4d2q2 = 4q2p2d2(q2 + 2pq + p2) = 4q2p2d2(p + q)2 = 4q2p2d2 = (2pqd)2

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2 VD = 4q4d2p2 + 8p3q3d2 + 4p4d2q2 = 4q2p2d2(q2 + 2pq + p2) = 4q2p2d2(p + q)2 = 4q2p2d2 = (2pqd)2 If there is no dominance (d=0), VA = If there is complete dominance (d=a), VA =

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2 VD = 4q4d2p2 + 8p3q3d2 + 4p4d2q2 = 4q2p2d2(q2 + 2pq + p2) = 4q2p2d2(p + q)2 = 4q2p2d2 = (2pqd)2 If there is no dominance (d=0), VA = 2pqa2. If there is complete dominance (d=a), VA = 8pq3a2.

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2 VD = 4q4d2p2 + 8p3q3d2 + 4p4d2q2 = 4q2p2d2(q2 + 2pq + p2) = 4q2p2d2(p + q)2 = 4q2p2d2 = (2pqd)2 If there is no dominance (d=0), VA = 2pqa2. If there is complete dominance (d=a), VA = 8pq3a2. If a = q = .5, VA = VD =

VA = 4p2q2α2 + 2pqα2(q – p)2 + 4p2q2α2 = 2pqα2(2pq + q2 – 2pq + p2 + 2pq) = 2pqα2(q2 + 2pq + p2) = 2pqα2(p + q)2 = 2pqα2 = 2pq[a + d(q – p)]2 VD = 4q4d2p2 + 8p3q3d2 + 4p4d2q2 = 4q2p2d2(q2 + 2pq + p2) = 4q2p2d2(p + q)2 = 4q2p2d2 = (2pqd)2 If there is no dominance (d=0), VA = 2pqa2. If there is complete dominance (d=a), VA = 8pq3a2. If a = q = .5, VA = ½ a2, VD = ¼ d2.

covAD • Since G = A + D, • VG = VA + VD + 2covAD, where covAD is the covariance between A and D. • covAD = • * Note: As the means of A and D are 0, in covxy = Σ(xi- μx)(yi- μy)/N, μx = μy = 0, thus covxy= Σxiyi/N. Given that we work with frequencies, covxy = Σxiyifi. A D a d G VG = VA + VD + 2covAD (if a = d = 1)

covAD • Since G = A + D, • VG = VA + VD + 2covAD, where covAD is the covariance between A and D. • covAD = -4p2q3αd + 4p2q2(q – p)αd + 4p3q2αd = 4p2q2αd(- q + q – p + p) = 0, thus • VG = VA + VD = 2pq[a + d(q – p)]2 + (2pqd)2. • * Note: As the means of A and D are 0, in covxy = Σ(xi- μx)(yi- μy)/N, μx = μy = 0, thus covxy= Σxiyi/N. Given that we work with frequencies, covxy = Σxiyifi. A D a d G VG = VA + VD + 2covAD (if a = d = 1)

In a), genotypic variance is all additve, and max when p = q = .5. Over b) and c), dominance variance is the same, and is max when p = q = 5. In c), however, VA has two maxima (at q = .15 and q = .85), and is zero only when p = q = .5. Here, VG ramains constant over a wide range of gene frequencies, but its composition changes profoundly. Note: Falconer & Mackay p.128: ‘A possible misunderstanding about the concept of additive genetic variance…’

Purcell: Sgene (single gene) http://statgen.iop.kcl.ac.uk/bgim/index2.html

Purcell: Sgene (single gene) • Effects of 2 genes against the background of residual variation: • Choose gene A or B • Use sliders for genetic value (a), dominance (d) and gene frequency • (the red allele is the increasing allele) • The graph gives the genetic values for gene A • Residual variance: other than the effects of A (and B) • The histogram shows the distribution of the trait in 1000 individuals • We can estimate VA: VA = 2pqα2 = 2pq[a+d(q-p)]2 • Thus, if p=q=1/2 and a=1/2: VA = 2*1/4*1/4 =0.13 http://statgen.iop.kcl.ac.uk/bgim/index2.html

Purcell: Sgene (single gene) • What is the heritability in this example? • VA = 0.13 • Vresidual = 0.2 • VP = VA + Vresidual • Heritability = http://statgen.iop.kcl.ac.uk/bgim/index2.html

Purcell: Sgene (single gene) • What is the heritability in this example? • VA = 0.13 • Vresidual = 0.2 • VP = VA + Vresidual • Heritability = VA/VP = 0.13 / 0.33 = 0.39 http://statgen.iop.kcl.ac.uk/bgim/index2.html

In practice we are less concerned with gene frequencies and gene effects than we are with estimation of variance components (next lecture). Note: all variance components are dependent on gene frequency [e.g., VA = 2pq[a + d(q – p)]2, VD = (2pqd)2], therefore any estimates of them are only valid for the population from which they are estimated. To arrive at variance components in the population, separate effects of all loci that contribute variation have to be combined. For random-mating populations in equilibrium, the total additive variance and the total dominance variance are the sums of the additive and dominance variances, respectively, attributable to each locus separately. But when more than one locus is under consideration then epistasis, if present, gives rise to interaction variance (i.e., the variance of interaction deviations). Interaction variance Two-factor interaction arises from the interaction of two loci, three-factor from three loci, etc. Higher-order interactions contribute very little variance; here we focus on two-factor interactions. Interaction between two breeding values (VAA) Interaction between breeding value and dominance deviation (VAD) Interaction between two dominance deviations (VDD) VI = VAA + VAD + VDD, etc. Interaction variance is generally difficult to estimate and relatively little is known about the importance of interaction as a source of variation (although it undoubtedly frequently occurs).

Variance due to disequilibrium If genotype frequencies at two or more loci considered jointly are not what they would be expected from the gene frequencies, the population is said to be in disequilibrium. Disequilibrium introduces an additional source of genetic variance. If G’ and G’’ are genotypic values of two separate loci, and G is their joint genotypic value (G = G’ + G’’), then VG = VG’ + VG’’ + 2covG’G’’. covG’G’’ ≠ 0 in the presence of disequilibrium. Two forms on non-random mating that generate disequilibrium: selection on parents (i.e., parents are a non-random sample of individuals in their generation), assortative mating (i.e., non-random mating with respect to the phenotype in question). E.g. Pearl (2000): see how selection on parent causes generates correlation in the descendants (also Dolan: dissertation) Two types of correlations of gene effects: correlation between genes at different loci in the same gamete (gametic phase, or linkage, disequilibrium) correlation between the genes in uniting pairs of gametes (i.e., correlation between mother’s and father’s genes) Selection on parents generates I), and assortative mating generates both I) and II).

Correlation between genotype and environment (rGE) rGE refers to a non-random distribution of genotypes over the environments. It may arise, for instance, from genetic control of exposure to environmental events (Kendler & Eaves, 1986). Three types of rGE are usually distinguished: Passive rGE: children inherit genes and an environment that both predispose them to a given phenotypic outcome. E.g., a parent who suffers from depression may pass on genes that predispose their child to develop depression, but in addition may inadvertently create a depressogenic environment for the child. Evocative rGE: person's genetically influenced characteristics evoke environmental reactions which exacerbate the characteristics. E.g., an anxious and withdrawn child, simply by behaving anxious and withdrawn, may elicit certain responses in other children (e.g. shunning) or in parents (more protective parenting) that contribute to the child’s anxiety. Active rGE: individuals, as a consequence of certain characteristics, actively seek out or create environments which are conducive of these characteristics. For instance, a withdrawn child may actively avoid social situations, such as birthday parties and sports activities, and thereby create an environment which fosters the child’s general withdrawal. In the presence of rGE, the expression for the phenotypic variance becomes: VP = VG + VE + 2covGE covGE G E g e P VP = VG + VE + 2covGE (if g = e = 1)

Interaction between genotype and environment (GxE) = dependency of the genetic effects on the environment and vice versa. Liability to a disorder as a function of genotype (AA, Aa or aa) and environmental exposure (protective or predisposing). The predisposing environment is associated with an increase in liability to develop a disease. a) This increase is equal in individuals with the AA, Aa and aa genotype (additive effects). b) The increase is different in individuals with different genotypes. Individuals with the AA genotype have a disproportionately low chance of developing a disease in the protective environment, but suffer from a disproportional increase in liability when exposed to the predisposing environment (Kendler & Eaves, 1986).

GE G E If GxE is present, P = G + E + IGE, and VP = VG + VE + VGE, where VGE is the variance due to GxE. If genotypes can be replicated (e.g. inbred lines, cloning), and multiple individuals of each of several genotypes reared in different environments, then a two-way ANOVA (genotypes x environments) can yield estimates of the variance between genotypes, variance between the environments, and variance due to interaction of genotypes and environments. If interaction is absent, the ‘best’ genotype will be the best in all environments. If interaction is present, then (e.g, in farming) particular genotypes may be sought for specific environments. i g e P VP = VG + VE + VGE (if g = e = i = 1)

Environmental sensitivity Some of the GxE may be ascribed to differences of genotypes in sensitivity to the environment. To measure environmental sensitivity, different genotypes may be reared in a range of different environments. Here, the environmental value is defined as the mean performance of all genotypes in that environment. The genotype’s sensitivity is then the regression of its genotypic value on the environmental value (see below). The variance due to GxE is estimated from an ANOVA, which allows one to quantify the effect of genotype, effect of environment, an effect of their interaction. The amount of variance attributable to differences in sensitivity is obtainable from the heterogeneity of regression slopes. β1= 1 160 β 1 = 1.4 140 β 1 = 0.9 Genotypic value 120 β 1 = 1.1 g=β0+β1e The regression of genotypic value on environmental value. x-axis: environmental value (i.e., mean performance of genotypes in a given environment). y-axis, left: genotypic value. y-axis, right: regression coefficients (representing the sensitivity of genotypes to the environment). Note the crossing interaction of the two genotypes in the middle (approximately equal mean over all environments, but reverse order of merit). 100 100 110 120 130 Environmental value

Unmodeled GxE and rGE[Purcell, S. (2002). Variance components models for gene-environment interaction in twin analysis. Twin Research, 5(6), 554-571.] How do unmodeled GxE and rGE bias parameter estimates in standard twin models? E.g.: If the additive genotype (A) interacts with common environment (C; environmental influences that increase phenotypic similarity between family members), then the phenotypic value may be decomposed as follows: P = aA + cC + iAC + eE, and its expected variance is VP = a2 + c2 + i2 + e2 (assuming the variances of the latent variables are scaled to 1). The expected twin covariances are Cov(P1,P2)= a2Cov(A1, A2) + c2Cov(C1, C2) + e2Cov(E1, E2) + i2Cov(A1C1, A2C2) = a2 + c2 + i2 for MZ twins = a2/2 + c2 + i2/2 for DZ twins as Cov(A1, A2) is 1 for MZ twins and 0.5 for DZ twins; Cov(C1,C2)=1 and Cov(E1, E2)=0 for all twins; also Cov(A1C1,A2C2)=Cov(A1, A2)Cov(C1, C2)=Cov(A1, A2). Similar covariance algebra can show that AxE interaction contributes to the E component. rA=0.5/1 rC=1 A1C1 A1 C1 E1 A2C2 A2 C2 E2 i a c e i a c e P1 P2 Twin 1 Twin 2

Unmodeled GxE and rGE[Purcell, S. (2002). Variance components models for gene-environment interaction in twin analysis. Twin Research, 5(6), 554-571.] How do unmodeled GxE and rGE bias parameter estimates in standard twin models? If A is correlated with an environmental variable, say C, then the expected phenotypic variance is VP = a2 + c2 + 2ac * rAC+ e2 and the expected twin covariances are Cov(P1,P2) = a2Cov(A1, A2) + c2Cov(C1, C2) + e2Cov(E1, E2) + acCov(A1, C2) + acCov(A2, C1) = a2 + c2 + 2ac * rAC for MZ twins = a2/2 + c2 + 2ac * rAC for DZ twins as Cov(A1, C2) = Cov(A2, C1) = rAC . Similarly, if A and E are non-independent then Cov(P1, P2) = a2 + c2 + 2ae * rAEforMZtwins = a2/2 + c2 + ae x rAE for DZ twins Thus: Interaction between A and C acts in the same way as A; interaction between A and E acts like E. Correlation between A and C acts like C; correlation between A and E acts like A. rAC rA=0.5/1 rC=1 rAC A1 C1 E1 A2 C2 E2 a c e a c e P1 P2 Twin 1 Twin 2

Homework: I) Assignment 8.3 from Falconer & Mackay. II) Using Sgene (http://statgen.iop.kcl.ac.uk/bgim/index2.html), visualize how allelefrequencies, genotypic values, etc, influence trait mean and variance: a) for a=0, d=0, p=0.4, residual variance = 0.04, scale = 2; varyafrom 0 to 1, b) for a=1, d=0, p=0.4, residual variance = 0.04, scale = 2; varydfrom -1 to 1, c) for a=1, d=0, p=0.4, residual variance = 0.04, scale = 2; varypfrom 0 to 1. Look at scatterplots, histograms and variance components.

Values & means: summary (Falconer & Mackay: chapter 7)