1 / 68

Introduction to QTL analysis

shina
Download Presentation

Introduction to QTL analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Introduction to QTL analysis Peter Visscher University of Edinburgh peter.visscher@ed.ac.uk

    2. Overview Principles of QTL mapping QTL mapping using sibpairs IBD estimation from marker data Improving power ML variance components Selective genotyping Large(r) pedigrees

    4. Mapping QTL Determining the position of a locus causing variation in the genome. Estimating the effect of the alleles and mode of action.

    5. Why map QTL ? To provide knowledge towards a fundamental understanding of individual gene actions and interactions To enable positional cloning of the gene To improve breeding value estimation and selection response through marker assisted selection (plants, animals) Science; Medicine; Agriculture

    7. Linkage = Co-segregation

    8. Recombination

    9. Map distance Map distance between two loci (Morgans) = Expected number of crossovers per meiosis Note: Map distances are additive. Recombination frequencies are not. 1 Morgan = 100 cM; 1 cM ~ 1 Mb

    10. Recombination & map distance

    11. Principles of QTL mapping Co-segregation of phenotypes and genotypes in pedigrees Genetic markers give information on IBD sharing between relatives [genotypes] Association between phenotypes and genotypes gives information on QTL location and effect [linkage] Need informative mapping population

    15. Line cross Only two QTL alleles segregating QTL effect can be estimated as the mean difference between genotype groups Power depends on sample size & effect of QTL Ascertain divergent lines Resolution of QTL map is low: ~10-40 Mb

    17. Outbred populations: Complications Markers not fully informative (segregating in the parental generation) QTL not segregating in all families (All F1 segregate in inbred line cross) Association between marker and QTL at the family rather than population level (i.e. linkage phase differs between families) Additional variance between families due to other loci

    18. Line cross vs. outbred population Cross Outbred # QTL alleles 2 ³ 2 # Generations 3 ³ 2 Required sample size 100s 1000s QTL Estimation Mean Variance

    19. QTL as a random effect yi = m + Qi + Ai + Ei Qi = QTL genotype contribution for chrom. segment Ai = Contribution from rest of genome var(y) = sq2 + sa2 + se2

    20. Logical extension of linear models used during the course This week: partitioning (co)variances into (causal) components QTL mapping: partitioning genetic variance into underlying components Linkage analysis: dissecting within-family genetic variation

    21. Genetic covariance between relatives cov(yi,yj) = pij sq2 + aij sa2 aij = average prop. of alleles shared in the genome (kinship matrix) pij = proportion of alleles IBD at QTL (0, ½ or 1) E(pij) = aij

    22. p pij = Pr(2 alleles IBD) + ½Pr(1 allele IBD) = proportion of alleles IBD in non- inbred pedigree Estimate pij with genetic markers

    23. Fully informative marker Determine IBD sharing between sibpairs unambiguously Example: Dad = 1/2 Mum= 3/4 Transmitted allele from Dad is either 1 or 2 Transmitted allele from Mum is either 3 or 4

    24. Sibpairs & fully informative marker # Alleles IBD p Pr. 0 0 ¼ 1 ½ ½ 2 1 ¼ E(p) = S pPr(p) = ½ E(p2) = S p2Pr(p) = 3/8 var(p) = E(p2) – E(p)2 = 1/8

    25. Haseman-Elston (1972) “The more alleles pairs of relatives share at a QTL, the greater their phenotypic similarity” or “The more alleles they share IBD, the smaller the difference in their phenotype”

    26. Population sib-pair trait distribution

    27. No linkage

    28. Under linkage

    29. Sib pair (or DZ twins) design to map QTL Multiple ‘families’ of two (or more) sibs Phenotypes on sibs Marker genotypes on sibs (& parents) Correlate phenotypes and genotypes of sibs

    30. Data structure is simple Pair Phenotypes Prop. alleles IBD 1 y11 y12 p1 2 y21 y22 p2 ..... n yn1 yn2 pn p = 0, ½ or 1 for fully informative markers

    31. Notation Y D = (y1 – y2) D2 = (y1 – y2)2 S = [(y1 – m) + (y2 – m)] S2 = [(y1 – m) + (y2 – m)]2 CP = (y1 – m)(y2 – m)

    32. Proposed analysis…... Data Method Reference y1 & y2 ML ‘LOD’ Parametric linkage analysis D2 Regression Haseman & Elston (1972) D2 & S2 Regression Drigalenko (1998) Xu et al. (2000); Sham & Purcell (2001); Forrest (2001) CP Regression Elston et al. (2000) y1 & y2 ML VC Goldgar (1990); Schork (1993) D ML Kruglyak & Lander (1995) D & S ML VC Fulker & Cherny (1996); Wright (1997)

    33. Properties of squared differences E(Y1 – Y2)2 = var(Y1 – Y2) + (E(Y1 – Y2))2 var(Y1 – Y2) = var(Y1) + var(Y2) -2cov(Y1,Y2) If E(Yi) = 0 and var(Y1)=var(Y2), then E(Y1 – Y2)2 = 2(1-r)var(Y)

    34. Haseman-Elston method Phenotype on relative pair j: Yj = (y1j - y2j)2 E(Yi) = E[(Q1j - Q2j + A1j - A2j + (e1j - e2j)2] = E[(Q1j - Q2j)2] + {2(1-aij)sa2 + 2se2} = 2[sq2 - cov(Q1j,Q2j)] + {se2} = (2sq2 + se2) - 2pjt sq2 pjt = proportion of alleles IBD at QTL (trait, t) for relative pair j

    35. Conditional expectation E(Yj| pjt) = (2sq2 + se2) - pjt 2sq2 negative slope of Y on p if sq2 > 0 estimate pjt from marker data (pjm) use simple linear regression to detect QTL: E(Yj| pjm) = a + bpjm

    37. Single fully informative marker b = -2(1 - 2r)2 sq2 (1 - 2r)2 sq2 term is analogous to variance explained by a single marker in a backcross/F2 design a = 2[1 - 2(1-r)r] sq2 + se2 r = recombination fraction between marker & QTL Statistical test: b = 0 versus b < 0 Disadvantage of method not powerful confounding between QTL location and effect

    38. Interval mapping for sibpair analysis (Fulker & Cardon, 1994) Estimate pjt from IBD status at flanking markers Allows genome screen, separating effect & location regression with largest R2 indicates map position of QTL

    39. Example from Cardon et al. (1994)

    40. Calculating pjt|pjm For pjt midway between two flanking markers: pjt ~ r2/c + ½[(1 - 2r)/c]pjm1 + ½[(1 - 2r)/c]pjm2 c = 1 - 2r + 2r2 r = recombination fraction between markers pjmk = pjm at flanking marker k Assumption: flanking markers are fully informative

    41. Examples r c pjt 0.5 0.5 0.5 0.2 17/25 (2/34) + (15/34)pjm1 + (15/34)pjm2 [if pjm1 and pjm2 are 1, pjt = 32/34 < 1]

    42. Exercise Calculate pjt for a location midway between two markers that are 30 cM apart, when the proportion of alleles shared at the flanking markers are 1.0 and 0.5. Use the Haldane mapping function to calculate the recombination rate between the markers. pjm1 = 1, pjm2 = 0.5

    43. Extensions to Haseman-Elston method Interval mapping Alternative models QTL with dominance Other methods to estimate pjt Using all markers on a chromosome (Merlin) Monte Carlo sampling methods Using both markers info & phenotypic info Add linkage information from: Zj = [(y1j - m) + (y2j - m)]2

    45. Estimating p when marker is not fully informative Using: Mendelian segregation rules Marker allele frequencies in the population

    46. IBD can be trivial…

    47. Two Other Simple Cases…

    48. A little more complicated…

    49. And even more complicated…

    50. Bayes Theorem for IBD Probabilities

    51. P(Marker Genotype|IBD State)

    52. Worked Example

    53. Exercise

    54. Using multiple markers Mendelian segregation rules Marker allele frequencies in the population Linkage between markers Efficient multi-marker (multi-point) algorithms available (e.g., Merlin, Genehunter)

    55. Software for QTL analysis of sibpairs Mx Merlin Genehunter S.A.G.E. ($) QTL Express (regression) Solar (complex pedigrees) Lots of others… http://www.nslij-genetics.org/soft/

    56. George Seaton, Sara Knott, Chris Haley, Peter Visscher

    57. Conclusions (sibpairs) Power of sib pair design is low more relative pairs needed more contrasts e.g. extended pedigrees selective genotyping extreme phenotypes are most informative for linkage more powerful analysis methods ML variance component analysis

    58. Maximum likelihood for sibpairs (assuming bivariate normality | p & fully informative marker) Full model: -2ln(L) = Snpln|Vp| + S(y-m)¢Vp-1(y-m) Vp = f2 + q2 + r2 f2 + pq2 f2 + pq2 f2 + q2 + r2

    59. Maximum likelihood Reduced model: -2ln(L) = nln|V| + (y-m)¢V-1(y-m) V = f2 + r2 f2 f2 f2 + r2

    60. Test statistic LRT = 2ln(MLfull) - 2ln(MLreduced) H0(q2=0): LRT ~ ½c2(1) + ½(0)

    61. Multipoint sib-pair trait-difference analysis for the phenotype ‘Irregular word test’. The graph shows LOD-score curves obtained by use of the MLvar method (if no dominance variance is assumed) in the computer program MAPMAKER/SIBS, with use of strict weighting (S) or of no weighting (N). Broken lines indicate LOD scores corresponding to significance levels (P = .05, P = .005, and P = .0005). The orientation of markers relative to chromosome 6 is given.Multipoint sib-pair trait-difference analysis for the phenotype ‘Irregular word test’. The graph shows LOD-score curves obtained by use of the MLvar method (if no dominance variance is assumed) in the computer program MAPMAKER/SIBS, with use of strict weighting (S) or of no weighting (N). Broken lines indicate LOD scores corresponding to significance levels (P = .05, P = .005, and P = .0005). The orientation of markers relative to chromosome 6 is given.

    63. Selective genotyping & sibpairs Concordant pairs both sibs in upper or lower tail of the phenotypic distribution Discordant pairs one sib in upper tail, other in lower tail Powerful design requires many (cheap) phenotypes

    64. Anxiety QTLs

    65. Results

    66. Variance component analysis in complex pedigrees Partition observed variation in quantitative traits into causal components, e.g., Polygenic Common environment (‘household’) QTL Residual, including measurement error IBD proportions (p) estimated from multiple markers

    69. Example: QTL analysis for BMI using a complex pedigree Multipoint linkage analysis results for chromosomes 2. Results are shown for BMI (black), PFM (green), fat mass (red), and lean mass (blue). The leptin gene is also located on chromosome 2, and could be a candidate gene for variation in BMI.Multipoint linkage analysis results for chromosomes 2. Results are shown for BMI (black), PFM (green), fat mass (red), and lean mass (blue). The leptin gene is also located on chromosome 2, and could be a candidate gene for variation in BMI.

More Related