470 likes | 795 Views
High-resolution mapping of meiotic crossovers and noncrossovers. Wolfgang Huber EMBL-EBI. Meiotic recombination. Proper chromosome segregation. Increase of genetic diversity. Gene A. Gene B. Gene C. Gene A. Gene b. Gene c. Gene a. Gene b. Gene c. Gene a. Gene B. Gene C.
E N D
High-resolution mapping of meioticcrossovers and noncrossovers Wolfgang Huber EMBL-EBI
Meiotic recombination • Proper chromosome segregation • Increase of genetic diversity Gene A Gene B Gene C Gene A Gene b Gene c Gene a Gene b Gene c Gene a Gene B Gene C
Double-strand break repair CO: NCO: Recombination initiates with a double-strand break in one DNA molecule. Only two DNA molecules (homologs) are shown here. Slide3
Non-uniform distribution of recombination across the genome female average male Human chr. 22q Yeast chr. 3 Petes T.D., 2001 Baudat F. & Nicolas A., 1997 Recombination hotspots are small genomic regions where recombination events cluster, surrounded by stretches with little or no recombination activity.
Map all recombination events that occurred in 50 yeast meiosis using high-density tiling microarrays
Clinical isolates of S. cerevisiae Clinical strain (YJM789) Laboratory strain (S288c) The common lab yeast Isolated from rotten fig in California in 1930s Domesticated: related to baker's yeast, wine-making and beer-brewing yeasts Genome sequence of S288c: A Goffeau et al. Science (1996) Isolated from immuno-compromised patients Pathogenic in mouse model of systemic infection Various fungal pathogenic characteristics: pseudohyphae, colony morphology switching Ability to grow at >37˚C – a virulence trait Genome sequence of YJM789: W Wei et al., PNAS (2007): 60k SNPS, 6k indels wrt S288c
Experimental approach Mancera, Bourgon et al. Nature 2008 (in press) Slide7
1 tiling array for 2 yeast genomes common S-specific Y-specific * 5’ 3’ Watson strand 8bp * 3’ 5’ Crick strand 4bp 25mer 10% 4% 86% S288c YJM789 291k 2,368k 108k Wei et al., PNAS (2007) Custom design manufactured by Affymetrix (probes)
Identification of previously unknown ncRNA and antisense transcripts and mapping of transcripts Antisense CBF1 David et al., PNAS (2006)
The computational & statistical challenges Genotyping marray probes and polymorphisms are in a many-to-many relationship. Tiling arrays provide thorough coverage, but probes can be badly behaved wrt sensitivity and specificity (e.g. cross-hybridisation). How to best take advantage of our experimental design, with mixture of parental and segregant hybes? Event rate inference Our data invert the traditional relationship between event and markers: instead of inferring small events between widely spaced markers, we typically have multiple markers over single events. Marker spacing matters, but in complicated ways. NCOs falling between markers are not observed. Slide10
Genotyping “single feature polymorphisms” Hybridization efficiency depends on number and position of mismatches. Differential hybridization provides a means of detecting polymorphisms, even when only the reference genome sequence is known.Winzeler et al., Science 281(5380), 1998. Brem et al., Science 296(5568), 2002. Steinmetz et al., Nature 416(6878), 2002. Borevitz et al., Genome Research 13(3), 2003. Given parental behavior, segregants can be genotyped via supervised classification. Slide11
Tiling arrays, probe sets, markers Probe set: group of probes which each exactly map to a unique locus and which interrogate a common polymorphism. Marker: one or more polymorphisms interrogated by the same probe set. 6: CTTCACTATTTGTACAGATCGCAAT 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA 3: GACTGGCCCTAACTTGACTATTTGT
Multivariate probe set dataparallel coordinate plots Slide13
A multivariate method SNPScanner: Gresham et al., Science 311, 2006 Parametric model of probe intensity xiwith & without presence of SNP as function of • Probe GC content • Position of SNP within the probe • Nucleotides surrounding the SNP Fit these model parameters using two sequenced strains with known SNPs To genotype a segregant or new strain at a given base, compute a Bayes factor assumption: covariance matrix diagonal and same
Residual correlation - neighbouring probes' data are not independent Covariances for the two genotypes are often quite different Parental arrays are informative, but alone do not always provide the best model. Supervised classification of offspring (i) wastes information and (ii) may be misleading. Definition of a classification boundary is necessary but not sufficient. Shapes of class distributions are useful for QA/QC. ~ 97% correct calls by SNPscanner method – not enough for the reliable detection of conversion events
ssG: a semi-supervised, model-based genotyping algorithm Two-component Normal mixture model p(x) = 1 pN(x | m1, S1) + 2 pN(x | m2, S2) For each array i and probeset have: (Xi,Yi) with array data Xi and class variable Yi. Yi known for parental arrays, unknown for segregants. Fit with the EM algorithm.
ssG An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: iteratively estimate class shapes and object class membership probabilities
ssG An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: iteratively estimate class shapes and object class membership probabilities
ssG An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: iteratively estimate class shapes and object class membership probabilities
Filtering ambiguous individual genotype calls Aberrant probe sets Weakly separating probesets Imbalanced probesets Probe Sets Genotype Calls
Recombination event inference for one tetrad median intermarker-distance: 78bp Slide27
Event size and marker resolution 4163 crossovers, 2126 non-crossovers across 46 meioses. Slide28
Inferring event rates Slide30
Recombination event rates Traditional corrections (e.g., Haldane) use recombination fraction, and adjust for unseen crossovers which occur between widely-spaced markers. High-density marker data invert the traditional relationship, placing multiple markers within most recombination events — both crossover (CO) and non-crossover (NCO).
Statistical model for event detection probabilities -M -w +w +M 0 Slide32
Hot spot identification Slide33
Hotspots Identified 179 recombination hot spots Incl. all previously known except for HIS2:HIS4, ARG4, CYS3, DED81, ARE1/IMG1, CDC19, THR4, LEU2-CEN3 None overlapped centromere Hottest: 28% of spores (59% of meioses) 84% overlap a promoter 25% of bases in hot spot intervals overlap promoters, while 68% overlap coding sequences
Correlation between localization of DSBs (initiation) and recombination events (outcome) - Event counts - DSB log-ratio Chr 3
genome wide distributions of CO and NCO hotspots are different (p<0.0005) Slide37
Groups of genes with distinct expression profiles are associated with hotspots (b,c) and subtypes of hotspots (d) Expression data from Primig et al., Nature Genetics (2000)
Mutants Zki8 Spo11 Spo11 Zki8 Zip4 Zip2 Zip3 Rad50 Mre11 Xrs2 Zip1 Zip1 Dmc1/Rad51 Mer3 SDSA DSBR Msh4/Msh5 Mus81 Mms4 Mlh1 Mlh3 crossover crossover noncrossover
The genomic effect of gene conversions Per meiosis, ~2.1% of polymorphic positions converted to the opposite genotype Up to 1% of a meiotic product‘s genome subject to conversion per single meiosis Conversion favours GC (1.4% increase of GC content at SNP positions, event-weighted) Also, hotspots tend to be GC-rich However, hotspot are also more diverse! („allelic homogenization“ appears to be counteracted by other forces, e.g. mutagenicity)
Recombination hotspots show more sequence diversity SNP frequency data from Sanger Institute's Saccharomyces Genome Resequencing Project (E. Louis, R. Durbin, D. Carter)
Conclusions As much as 1% of a meiotic product’s genome may be subject to conversion in a single meiosis, and this has GC bias. Distinct genomic distributions of crossovers and non-crossovers suggest that genomic position (local chromatin or sequence context) is a contributing factor in DSB resolution. Interference between crossovers and non-crossovers. Crossover interference is reduced in msh4 mutant but non-crossover rates are unaffected, supporting that distinct pathways lead to these events. Conversion hotspots unlink genomic regions from the linkage map
Acknowledgement Eugenio Mancera Ramos Richard Bourgon • Lars Steinmetz EMBL HD Julien Gagneur Marina Granovskaia Sandra Clauder-Münster Fabiana Perocchi Wu Wei Zhenyu Xu EBI • Simon AndersElin Axelsson • Ligia Bras • Alessandro Brozzi • Tony Chiang • Audrey Kauffmann • Greg Pau • Oleg Sklyar • Jörn Tödling • David Jitao Zhang • The contributors to R and Bioconductor projects