1 / 26

Identifying conserved segments in rearranged and divergent genomes

Identifying conserved segments in rearranged and divergent genomes. Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling. Comparing genomic architectures. Genome sequence and architecture comparison can lead to insight about organismal Evolutionary forces Gene functions

tovah
Download Presentation

Identifying conserved segments in rearranged and divergent genomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying conserved segments in rearranged and divergent genomes Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling

  2. Comparing genomic architectures Genome sequence and architecture comparison can lead to insight about organismal • Evolutionary forces • Gene functions • Phenotypes Rearrangement, gene gain, loss, and duplication obfuscate homology

  3. Origin of replication Terminus Structure of the bacterial chromosome Breakpoints of inversions occur an equal distance from the origin to maintain replichore balance. (Tillier and Collins 2000, Ajana et. al. 2002) We call such rearrangements “symmetric inversions” Replication proceeds simultaneously on each “replichore” Replichore size difference > 20% is selected against (Guijo et. al. 2001)

  4. A dot plot: Each dot is a pairwise (or n-way) local alignment

  5. Blue: Same strand Red: Opposite strand Goal: Identify local homologous (orthologous) segments

  6. Tools for segmental homology detection GRIMM-Synteny (Pevzner et. al. 2003, Bourque et. al. 2004) - cluster markers within a fixed distance FISH (Vision et. al. 2003) • find statistically over-represented clusters of markers within a fixed distance LineUp (Hampson et. al. 2003) • find collinear runs of markers among pairs of genomes, allowing degeneracy Some alignment tools: Shuffle-LAGAN (Brudno et. al. 2003), Mauve (Darling et. al. 2004)

  7. Small segments separated by lineage-specific regions may not be detected by methods based strictly on distance. Key idea: use a combination of conserved marker order (collinearity) and alignment score

  8. Finding conserved regions: A pseudo-Gibbs sampler method Given: A set of M monotypic markers M Do: Assign a posterior probability that any marker m є M is part of a conserved region Use MCMC methodology to sample the frequency of each marker’s inclusion in high-scoring configurations. Use frequency as an estimate of “posterior probability”

  9. Finding conserved regions: A pseudo-Gibbs sampler method Define a configuration X as a vector of length M of binary random variables: e.g. X = ( X1, X2, …, XM ) A configuration value xj maps marker mj to either signal (1) or noise (0) e.g. x = (0,1,0,0,1,1,…,1,0) There are 2M possible configurations Run a Markov chain of length N over configuration space: (X1, X2, …, XN)

  10. Sum of scores for all collinear markers to the left Score of marker j Sum of scores for all collinear markers to the right Sample possible marker configurations Start with a random initial configuration, THEN: Select a marker, sample whether it should be a 0 or 1 based on the current configuration wv is the score of marker v, xv is the configuration value (0 or 1)

  11. Transform LCB score to probability The scale parameter c is used in tandem with the sigmoid to map a marker’s score to a probability:

  12. Sample a new value for xj Set xj to 1 with probability given by the marker’s score transformation First allow the chain a “burn-in” period, then continue for many iterations. The frequency, or “posterior probability” of mj is:

  13. Our method assigns each marker a p.p. Threshold γ separates signal from noise

  14. Our method assigns each marker a p.p. Using γ = .5, the X pattern appears

  15. Our method assigns each marker a p.p. Using γ = .5, the X pattern appears

  16. Application to 4 divergent Streptococcus Markers are reciprocal best blastp hits of ORFs among: S. agalactiae S. pyogenes S. pneumoniae S. mutans S. pneumoniae

  17. What is the distribution of segment sizes in Streptococci? Total Segments As resolution increases, large segments are broken up by smaller segments c = 75, γ = .45 “Low resolution” 26 c = 30, γ = .45 “Medium resolution” 32 c = 20, γ = .50 “High-1 resolution” 57 c = 20, γ = .30 “High-2 resolution” 72 Segment sizes (Markers per segment)

  18. What was the ancestral genome organization? Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments

  19. What was the ancestral genome organization? Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments Failed: The suggested rearrangements do not maintain replichore balance

  20. What was the ancestral genome organization? Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments Failed: The suggested rearrangements do not maintain replichore balance Try using the 26 larger, low resolution segments Surprise! A success:

  21. Transforming S. agalactiae into S. pyogenes

  22. Conclusions - The pseudo-Gibbs sampler method detects collinear segments at a variety of scales - It would be nice to have an inversion phylogeny inference tool that accounts for replichore balance! - Large segments in Streptococci appear to rearrange by symmetric inversions - Small segments? An open problem.

  23. Future directions Can a biologically relevant full joint probability distribution be expressed over configurations? - If so, then a true Gibbs sampler could be employed Problems: - Some rearrangements occur with different frequency (e.g. symmetric inversions about the terminus vs. IS-mediated translocation) - Distinguish rearrangement by H.T., gene duplication and subsequent loss, symmetric inversion, etc.

  24. Acknowledgements Bob Mau – did most of this work My Ph.D. advisers: Nicole T. Perna and Mark Craven Others who have contributed insight: Jeremy Glasner, Fred Blattner, Eric Cabot GEL@UW-Madison Grant $. Money : NIH Grant GM62994-02. NLM Training Grant 5T15M007359-03 to A.E.D.

More Related