E N D
1. Introduction to QTL analysis Peter Visscher
University of Edinburgh
peter.visscher@ed.ac.uk
2. Overview Principles of QTL mapping
QTL mapping using sibpairs
IBD estimation from marker data
Improving power
ML variance components
Selective genotyping
Large(r) pedigrees
4. Mapping QTL Determining the position of a locus causing variation in the genome.
Estimating the effect of the alleles and mode of action.
5. Why map QTL ? To provide knowledge towards a fundamental understanding of individual gene actions and interactions
To enable positional cloning of the gene
To improve breeding value estimation and selection response through marker assisted selection (plants, animals)
Science; Medicine; Agriculture
7.
Linkage = Co-segregation
8. Recombination
9. Map distance
Map distance between two loci (Morgans)
= Expected number of crossovers per meiosis
Note: Map distances are additive. Recombination frequencies are not.
1 Morgan = 100 cM; 1 cM ~ 1 Mb
10. Recombination & map distance
11. Principles of QTL mapping Co-segregation of phenotypes and genotypes in pedigrees
Genetic markers give information on IBD sharing between relatives [genotypes]
Association between phenotypes and genotypes gives information on QTL location and effect [linkage]
Need informative mapping population
15. Line cross Only two QTL alleles segregating
QTL effect can be estimated as the mean difference between genotype groups
Power depends on sample size & effect of QTL
Ascertain divergent lines
Resolution of QTL map is low: ~10-40 Mb
17. Outbred populations: Complications Markers not fully informative (segregating in the parental generation)
QTL not segregating in all families
(All F1 segregate in inbred line cross)
Association between marker and QTL at the family rather than population level
(i.e. linkage phase differs between families)
Additional variance between families due to other loci
18. Line cross vs. outbred population Cross Outbred
# QTL alleles 2 ³ 2
# Generations 3 ³ 2
Required sample size 100s 1000s
QTL Estimation Mean Variance
19. QTL as a random effect yi = m + Qi + Ai + Ei
Qi = QTL genotype contribution for chrom. segment
Ai = Contribution from rest of genome
var(y) = sq2 + sa2 + se2
20. Logical extension of linear models used during the course This week: partitioning (co)variances into (causal) components
QTL mapping: partitioning genetic variance into underlying components
Linkage analysis: dissecting within-family genetic variation
21. Genetic covariance between relatives cov(yi,yj) = pij sq2 + aij sa2
aij = average prop. of alleles shared in the genome (kinship matrix)
pij = proportion of alleles IBD at QTL
(0, ½ or 1)
E(pij) = aij
22. p pij = Pr(2 alleles IBD) + ½Pr(1 allele IBD)
= proportion of alleles IBD in non- inbred pedigree
Estimate pij with genetic markers
23. Fully informative marker Determine IBD sharing between sibpairs unambiguously
Example: Dad = 1/2 Mum= 3/4
Transmitted allele from Dad is either 1 or 2
Transmitted allele from Mum is either 3 or 4
24. Sibpairs & fully informative marker # Alleles IBD p Pr.
0 0 ¼
1 ½ ½
2 1 ¼
E(p) = S pPr(p) = ½
E(p2) = S p2Pr(p) = 3/8
var(p) = E(p2) – E(p)2 = 1/8
25. Haseman-Elston (1972) “The more alleles pairs of relatives share at a QTL, the greater their phenotypic similarity”
or
“The more alleles they share IBD, the smaller the difference in their phenotype”
26. Population sib-pair trait distribution
27. No linkage
28. Under linkage
29. Sib pair (or DZ twins) design to map QTL Multiple ‘families’ of two (or more) sibs
Phenotypes on sibs
Marker genotypes on sibs (& parents)
Correlate phenotypes and genotypes of sibs
30. Data structure is simple Pair Phenotypes Prop. alleles IBD
1 y11 y12 p1
2 y21 y22 p2
.....
n yn1 yn2 pn
p = 0, ½ or 1 for fully informative markers
31. Notation Y
D = (y1 – y2)
D2 = (y1 – y2)2
S = [(y1 – m) + (y2 – m)]
S2 = [(y1 – m) + (y2 – m)]2
CP = (y1 – m)(y2 – m)
32. Proposed analysis…... Data Method Reference
y1 & y2 ML ‘LOD’ Parametric linkage analysis
D2 Regression Haseman & Elston (1972)
D2 & S2 Regression Drigalenko (1998)
Xu et al. (2000); Sham & Purcell (2001); Forrest (2001)
CP Regression Elston et al. (2000)
y1 & y2 ML VC Goldgar (1990); Schork (1993)
D ML Kruglyak & Lander (1995)
D & S ML VC Fulker & Cherny (1996); Wright (1997)
33. Properties of squared differences E(Y1 – Y2)2 = var(Y1 – Y2) + (E(Y1 – Y2))2
var(Y1 – Y2) = var(Y1) + var(Y2) -2cov(Y1,Y2)
If E(Yi) = 0 and var(Y1)=var(Y2), then
E(Y1 – Y2)2 = 2(1-r)var(Y)
34. Haseman-Elston method Phenotype on relative pair j:
Yj = (y1j - y2j)2
E(Yi) = E[(Q1j - Q2j + A1j - A2j + (e1j - e2j)2]
= E[(Q1j - Q2j)2] + {2(1-aij)sa2 + 2se2}
= 2[sq2 - cov(Q1j,Q2j)] + {se2}
= (2sq2 + se2) - 2pjt sq2
pjt = proportion of alleles IBD at QTL (trait, t) for relative pair j
35. Conditional expectation E(Yj| pjt) = (2sq2 + se2) - pjt 2sq2
negative slope of Y on p if sq2 > 0
estimate pjt from marker data (pjm)
use simple linear regression to detect QTL:
E(Yj| pjm) = a + bpjm
37. Single fully informative marker b = -2(1 - 2r)2 sq2
(1 - 2r)2 sq2 term is analogous to variance explained by a single marker in a backcross/F2 design
a = 2[1 - 2(1-r)r] sq2 + se2
r = recombination fraction between marker & QTL
Statistical test: b = 0 versus b < 0
Disadvantage of method
not powerful
confounding between QTL location and effect
38. Interval mapping for sibpair analysis(Fulker & Cardon, 1994) Estimate pjt from IBD status at flanking markers
Allows genome screen, separating effect & location
regression with largest R2 indicates map position of QTL
39. Example from Cardon et al. (1994)
40. Calculating pjt|pjm For pjt midway between two flanking markers:
pjt ~ r2/c + ½[(1 - 2r)/c]pjm1 + ½[(1 - 2r)/c]pjm2
c = 1 - 2r + 2r2
r = recombination fraction between markers
pjmk = pjm at flanking marker k
Assumption: flanking markers are fully informative
41. Examples r c pjt
0.5 0.5 0.5
0.2 17/25 (2/34) + (15/34)pjm1 + (15/34)pjm2
[if pjm1 and pjm2 are 1, pjt = 32/34 < 1]
42. Exercise Calculate pjt for a location midway between two markers that are 30 cM apart, when the proportion of alleles shared at the flanking markers are 1.0 and 0.5. Use the Haldane mapping function to calculate the recombination rate between the markers.
pjm1 = 1, pjm2 = 0.5
43. Extensions to Haseman-Elston method Interval mapping
Alternative models
QTL with dominance
Other methods to estimate pjt
Using all markers on a chromosome (Merlin)
Monte Carlo sampling methods
Using both markers info & phenotypic info
Add linkage information from:
Zj = [(y1j - m) + (y2j - m)]2
45. Estimating p when marker is not fully informative Using:
Mendelian segregation rules
Marker allele frequencies in the population
46. IBD can be trivial…
47. Two Other Simple Cases…
48. A little more complicated…
49. And even more complicated…
50. Bayes Theorem for IBD Probabilities
51. P(Marker Genotype|IBD State)
52. Worked Example
53. Exercise
54. Using multiple markers Mendelian segregation rules
Marker allele frequencies in the population
Linkage between markers
Efficient multi-marker (multi-point) algorithms available (e.g., Merlin, Genehunter)
55. Software for QTL analysis of sibpairs Mx
Merlin
Genehunter
S.A.G.E. ($)
QTL Express (regression)
Solar (complex pedigrees)
Lots of others…
http://www.nslij-genetics.org/soft/
56. George Seaton, Sara Knott, Chris Haley, Peter Visscher
57. Conclusions (sibpairs) Power of sib pair design is low
more relative pairs needed
more contrasts e.g. extended pedigrees
selective genotyping
extreme phenotypes are most informative for linkage
more powerful analysis methods
ML variance component analysis
58. Maximum likelihood for sibpairs(assuming bivariate normality | p& fully informative marker) Full model:
-2ln(L) = Snpln|Vp| + S(y-m)¢Vp-1(y-m)
Vp = f2 + q2 + r2 f2 + pq2
f2 + pq2 f2 + q2 + r2
59. Maximum likelihood Reduced model:
-2ln(L) = nln|V| + (y-m)¢V-1(y-m)
V = f2 + r2 f2
f2 f2 + r2
60. Test statistic LRT = 2ln(MLfull) - 2ln(MLreduced)
H0(q2=0): LRT ~ ½c2(1) + ½(0)
61. Multipoint sib-pair trait-difference analysis for the phenotype ‘Irregular word test’. The graph shows LOD-score curves obtained by use of the MLvar method (if no dominance variance is assumed) in the computer program MAPMAKER/SIBS, with use of strict weighting (S) or of no weighting (N). Broken lines indicate LOD scores corresponding to significance levels (P = .05, P = .005, and P = .0005). The orientation of markers relative to chromosome 6 is given.Multipoint sib-pair trait-difference analysis for the phenotype ‘Irregular word test’. The graph shows LOD-score curves obtained by use of the MLvar method (if no dominance variance is assumed) in the computer program MAPMAKER/SIBS, with use of strict weighting (S) or of no weighting (N). Broken lines indicate LOD scores corresponding to significance levels (P = .05, P = .005, and P = .0005). The orientation of markers relative to chromosome 6 is given.
63. Selective genotyping & sibpairs Concordant pairs
both sibs in upper or lower tail of the phenotypic distribution
Discordant pairs
one sib in upper tail, other in lower tail
Powerful design
requires many (cheap) phenotypes
64. Anxiety QTLs
65. Results
66. Variance component analysis in complex pedigrees Partition observed variation in quantitative traits into causal components, e.g.,
Polygenic
Common environment (‘household’)
QTL
Residual, including measurement error
IBD proportions (p) estimated from multiple markers
69. Example: QTL analysis for BMI using a complex pedigree Multipoint linkage analysis results for chromosomes 2. Results are shown for BMI (black), PFM (green), fat mass (red), and lean mass (blue). The leptin gene is also located on chromosome 2, and could be a candidate gene for variation in BMI.Multipoint linkage analysis results for chromosomes 2. Results are shown for BMI (black), PFM (green), fat mass (red), and lean mass (blue). The leptin gene is also located on chromosome 2, and could be a candidate gene for variation in BMI.