1 / 33

Fine mapping of recombination in S. cerevisiae

Fine mapping of recombination in S. cerevisiae. Wolfgang Huber EMBL - EBI. The maths of marker genotyping sensitivity, specificity, data QA/QC  Event classification cross-overs, conversions… and weirdness  Event rates biological significance. Single-reporter methods.

kaiser
Download Presentation

Fine mapping of recombination in S. cerevisiae

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine mapping of recombination in S. cerevisiae Wolfgang Huber EMBL - EBI

  2. The maths of marker genotyping sensitivity, specificity, data QA/QC  Event classification cross-overs, conversions… and weirdness  Event rates biological significance

  3. Single-reporter methods • De novo polymorphism detection • Winzeler et al.Science 281, 1998 (and others): ANOVA testing 1 = 1. • Borevitz et al.Genome Research 13, 2003: moderated t-test (SAM). • Brem et al. Science 296, 2002: moderated t-test, then cluster all data (parental and segregant) and discard SFPs for which clusters don’t separate the parental data. • Segregant genotyping (using polymorphims) • Use the estimated posterior probability of class membership (uniform prior on the classes): • Brem et al. augment this: are estimated from clustered data.

  4. But we have multiple reporters per SNP: probe sets 6: CTTCACTATTTGTACAGATCGCAAT Probe sets: a set of reporters that exactly + uniquely map to a location and interrogate one polymorphism 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA 3: GACTGGCCCTAACTTGACTATTTGT

  5. Multivariate analysis of probe set dataparallel coordinate plots log2 intensity reporters in probe set

  6. Multivariate analysis of probe set dataparallel coordinate plots

  7. Multivariate methods SNPScanner: Gresham et al., Science 311, 2006: • Model probe intensity xi with & without presence of SNP as function of • Probe GC content • Position of SNP within the probe • Nucleotides surrounding the SNP • Fit model parameters using two sequenced strains with known SNPs. • To genotype a segregant or new strain at a given base, compute a likelihood ratio assumption: covariance matrix diagonal and same

  8. But • neighbouring probes' data are not independent • variances for the two genotypes are often quite different • training data is often not representative • likelihood ratio test generates too many FPs •  a generalized multi-probe method

  9. GTS (genotyping by semi-supervised clustering) An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: simutaneously estimate class shapes and object class membership

  10. GTS (genotyping by semi-supervised clustering) An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: simutaneously estimate class shapes and object class membership

  11. GTS (genotyping by semi-supervised clustering) An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: simutaneously estimate class shapes and object class membership R package ss.genotyping

  12. Examples of probe set results

  13. Aberrant probe sets (cross-hybridization?)

  14. Aberrant probe sets (cross-hybridization?)

  15. Filtering ambiguous individual genotype calls (z) Aberrant probe sets Weakly separating probesets Imbalanced probesets Probe Sets Genotype Calls

  16. Benchmark SNPScanner - GTS • 233 Affymetrix yeast tiling arrays from Steinmetz group: 13 S288, 12 YJM789: training data 52 tetrads of crosses: to be genotyped • Same post-processing/filter

  17. GTS vs SNPScanner arrays genomic position (markers)

  18. GTS vs SNPScanner

  19. GTS vs SNPScanner

  20. High resolution in crossover regions

  21. Three adjacent cross-overs involving three chromosomes chr 1, wt_47

  22. A cross-over plus two long conversions, involving all four chromosomes chr 3, wt_19

  23. Three adjacent conversions involving three chromosomes chr 3, wt_38

  24. Cross-over accompanied by multiple conversions chr 4, wt_36

  25. Event classification Automatic algorithm takes tetrad-level genotype traces and assigns them into events: Cross-over, conversion, complex cross-over, complex coversion,... R package recombination.genotyping Still need manual curation: we are just beginning to understand the spectrum of possible event types!

  26. Genetic Interactions Genotypes at pairs of loci on different chromosomes are unlinked, but the population shows evidence of selection over-represen-tation under-represen-tation

  27. Genetic interaction network of S288c-YJM789 crosses

  28. Acknowledgements EMBL HD Lars Steinmetz Julien Gagneur Zhenyu Xu Sandra Clauder-Münster Fabiana Perocchi Wu Wei • EBI • Elin Axelsson • Ligia Bras • Alessandro Brozzi • Tony Chiang • Audrey Kauffmann • Paul McGettigan • Greg Pau • Oleg Sklyar • Mike Smith • Jörn Tödling • Jitao Zhang Richard Bourgon Eugenio Mancera Ramos • The contributors to R and Bioconductor projects

  29. Tetrad-level results

  30. Crossovers accompanied by events on other strands

  31. Double crossovers

  32. Summary • Semi-supervised clustering is natural given the experimental structure • Parental data are often not a faithful indicator of offspring behavior! Supervised classification may experience problems for some polymorphisms. • Multivariate Gaussian model is adequate • EM works well when data behave as expected — but this is not always the case. Importance of fit diagnostics, QA/QC, post-processing filters. • Outlook • Hotspots, conversion/crossover ratio, sizes, spacing and interference. • Msh4 mutant data (deficient in the putative interference-generating pathway): how do interference patterns change? • Unanticipated polymorphism detection (de-novo in segregants; in unsequenced strains)

More Related