Zoology 2005 Part 2 - PowerPoint PPT Presentation

jaden
zoology 2005 part 2 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Zoology 2005 Part 2 PowerPoint Presentation
Download Presentation
Zoology 2005 Part 2

play fullscreen
1 / 57
Download Presentation
Zoology 2005 Part 2
391 Views
Download Presentation

Zoology 2005 Part 2

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Zoology 2005 Part 2 Richard Mott

  2. Inbred Mouse Strain Haplotype Structure • When the genomes of a pair of inbred strains are compared, • we find a mosaic of segments of identity and difference (Wade et al, Nature 2002). • A QTL segregating between the strains must lie in a region of sequence difference. • What happens when we compare more than two strains simultaneously?

  3. No Simple Haplotype Block Mosaic Yalcin et al 2004 PNAS

  4. …But a Tree Mosaic

  5. In-silico Mapping • Simple idea- • Collect phenotypes across a set of inbred strains • Genotype the strains (ONCE) • Look for phenotype-genotype correlation • Works well for simple Mendelian traits (eg coat colour) • Suggested as a panacea for QTL mapping

  6. In-silico Mapping Problems • Less well-suited for complex traits • Number of strains required grows quickly with the complexity of the trait. Suggested at least 100 strains required, possibly more if epistasis is present • Require high-density genotype/sequence data to ensure identity-by-state = identity by-descent • May be very useful for the dissection of a QTL previously identified in a F2 cross (look for patterns of sequence difference)

  7. Recombinant Inbred Lines • Panels of inbred lines descended form pairs of inbred strains • Genomes are inbred mosaics of the founders • Lines only need be genotyped once • Similar to in-silico mapping except • identity-by-descent=identity-by-state • Coarser recombination structure • ?lower resolution mapping?

  8. BXD chromosome 4

  9. Testing if a variant is functional without genotyping it(Yalcin et al, Genetics 2005) • Requirements: • A Heterogeneous Stock, genotyped at a skeleton of markers • The genome sequences of the progenitor strains • A statistical test

  10. Merge Analysis • Each polymorphism groups together the founders according to their alleles • If the polymorphism is functional, then a model in which the phenotypic strain effects are estimated after merging the strains together should be as good as a model where each strain can have an independent effect. • Compare the fit of “merged” and “unmerged” genetic models to test if the variant is functional. • If the fit of the merged model is poor then that variant can be eliminated.

  11. Merge Analysis

  12. Merge Analysis

  13. How can we show a gene under a QTL peak affects the trait? • Genetic Mapping identifies Functional Variants, not Genes • Could be a control element affecting some other gene

  14. Quantitative Complementation KO 0

  15. Quantitative Complementation KO wt Low High 30 0 50 100

  16. Quantitative Complementation KO wt Low High d 30 0 50 100

  17. Quantitative Complementation KO wt Low High d d 30 0 50 100 D= d -d

  18. Quantitative Complementation KO wt Low High d d 30 0 50 100 D= d -d

  19. Using Functional Information to Confirm Genes • Further experiments • further bioinformatics, eg networks, functional annotation (GO, KEGG) • candidate gene sequencing • gene expression analyses (eQTL) of • founder strains • HS

  20. Mouse/human sequence comparison

  21. Enhancer reporter assays enhancer promoter luciferase reporter enhancer promoter luciferase reporter

  22. Enhancer elements affect promoter expression

  23. Large-Scale Genetic Mapping • Using a Heterogeneous Stock • Multiple Phenotypes collected in parallel

  24. Predictions (from simulation of an HS population) • In a population of 1,000 HS animals: • Genome-wide power to detect 5% QTL ~ 0.92 • Resolution < 2 Mb

  25. Study design • 2,000 mice • 15,000 diallelic markers • More than 100 phenotypes • each mouse subject to a battery of tests spread over weeks 5-9 of the animal’s life • more (post-mortem) phenotypes being added

  26. Phenotypes

  27. Covariates • For each phenotype, we recorded covariates, eg, • experimenter • time of day • apparatus (eg, Shock Chamber 3)

  28. Data collection • All animals microchipped • Automated data checking, processing and uploading • All data uploaded into the Integrated Genotyping System (IGS) database

  29. Genotypes from Illumina • Genotyped and phenotyped 2,000 offspring • Genotyped 300 parents • Pedigree analysis shows genotyping was 99.99% accurate • 11, 558 markers polymorphic in HS

  30. QTL mapping • Models • HAPPY and single marker association • Fitting framework • Linear regression of (transformed) phenotypes • Survival analysis for latency data • Logit-based models for categorical data • Significant covariates incorporated into the null model, eg Null = Startle ~ TestChamber + BodyWeight + Year + Age + Hour + Gender Additive Null + additive genetic info for locus Full Null + full genetic info for locus

  31. QTL mapping • Significance tests • partial F-test (linear models), Chi-square / LRT (others) • Significance thresholds • different for each phenotype • have to take into account LD • fit distribution to scores of permuted data

  32. E-values • We set score thresholds using ideas from sequence databank search programs such as BLAST

  33. E-values • We set score thresholds using ideas from sequence databank search programs such as BLAST • The E-value of a threshold is the number of times you would expect to see a false positive exceed the threshold in a genome scan

  34. E-values • We set score thresholds using ideas from sequence databank search programs such as BLAST • The E-value of a threshold is the number of times you would expect to see a false positive exceed the threshold in a genome scan • Applying the Bonferroni correction to the number of marker intervals is too severe because LD makes neighbouring scores correlated.

  35. E-values • We set score thresholds using ideas from sequence databank search programs such as BLAST • The E-value of a threshold is the number of times you would expect to see a false positive exceed the threshold in a genome scan • Applying the Bonferroni correction to the number of marker intervals is too severe because LD makes neighbouring scores correlated. • Permutation analyses indicate the score of the most significant expected random score amongst all ~12000 marker intervals behaves as if it was drawn from M~4000 independent tests.

  36. E-values • We set score thresholds using ideas from sequence databank search programs such as BLAST • The E-value of a threshold is the number of times you would expect to see a false positive exceed the threshold in a genome scan • Applying the Bonferroni correction to the number of marker intervals is too severe because LD makes neighbouring scores correlated. • Permutation analyses indicate the score of the most significant expected random score amongst all ~12000 marker intervals behaves as if it was drawn from M~4000 independent tests. • Hence a nominal P-value of p corresponds to an E-value of pM

  37. Problems Our population includes both siblings and unrelateds • We have ignored this distinction And therefore: • Confounding environmental family effects with genetic family effects • Allowing ghost peaks due to linkage disequilibrium between markers within a sibship Our solution so far: (1) Investigating the effect of environmental factors and building covariates into the model (2) Identify peaks by a multiple conditional fit

  38. Multiple Peak FittingForward Selection • For each phenotype’s genome scan: • Make list of all peaks > genome-wide threshold T • Fit most significant peak, P1 • Go through list of peaks, refitting each on conditional upon the most significant peak. • Add the most significant remaining peak, P2 • Continue refitting remaining peaks P3 , P4 … and adding them into model until the most significant remaining peak < T

  39. Peaks found by multiple conditional fit Multiple conditional fit (using additive model only) number of phenotypes

  40. Database for scans

  41. Database for scans Additive model Full model • E-value thresholds • additive only • E<0.01 is about the same as genome-wide corrected p<0.01.

  42. Database for scans zoom in

  43. Covariates

  44. QTL Mapping: Validation • Coat colour • Detection of known QTLs

  45. Coat colour genes

  46. A known QTL: HDL HS mapping Wang et al, 2003

  47. High Resolution QTLs

  48. New QTLs: two examples • Freeze.During.Tone (from Cue Conditioning behavioural experiment) …………1 peak • % of CD4 in CD3 cells (immunology assay) …………10 peaks

  49. Freezing TONE TONE Cue Conditioning • Freezing in response to a conditioned stimulus

  50. Cue Conditioning • Freeze.During.Tone: huge effect, small number of genes chr15 cntn1: Contactin precursor (Neural cell surface protein)