1 / 52

Genome Rearrangements in Evolution and Cancer

Genome Rearrangements in Evolution and Cancer. Guillaume Bourque Genome Institute of Singapore HKU-Pasteur Research Centre - Hong Kong August 28 th , 2009. Outline. Genome Rearrangements in Evolution [ ??? ] Cancer genomics. Genome rearrangements in evolution. 1999. High hopes.

dian
Download Presentation

Genome Rearrangements in Evolution and Cancer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Rearrangements in Evolution and Cancer Guillaume Bourque Genome Institute of Singapore HKU-Pasteur Research Centre - Hong Kong August 28th, 2009

  2. Outline • Genome Rearrangements in Evolution • [ ??? ] • Cancer genomics

  3. Genome rearrangements in evolution 1999

  4. High hopes • Explain the physical clustering of gene families (regulation, editing or retention). • Understand whether even longer linkage associations were preserved by chance or by selection (developmental or functional). • Resolve the mammalian phylogeny using genomic segment exchanges as characters. • Discover molecular fossils of precipitous genomic events. • Identify genetic determinants of reproductive isolation, adaptation, survival and species formation. O’Brien et al, Science 1999

  5. Need to reverse complement Comparing 2 sequences GGCACAAATCCAAATCCAAATCCGGGTTGGGGTTGGGGTTGGGGTTGCGACACATTTGGCCTGTCGTCGTCCGTCGTC GGCACAAATCCAAATCCAAATCCAATGTGTCGCAACCCCAACCCCAACCCCAACCCTGGCCTGTCGTCGTCCGTCGTC

  6. 5 4 3 2 1 1 -2 3 4 5 1 2 3 -4 5 If you have 3 sequences… Seq_1 vs Seq_2 Seq_1 vs Seq_3 Seq_2 vs Seq_3 Seq_1 : 1 -2 3 4 5 Seq_2 : 1 2 3 -4 5 Seq_3 : 1 2 3 4 5

  7. Rearrangement Phylogeny A: 1 2 3 4 5 Inversion Block 4 Inversion Block 2 Seq_1: 1 -2 3 4 5 Seq_2: 1 2 3 -4 5 Seq_3: 1 2 3 4 5

  8. Synteny blocks

  9. Genome rearrangements Reversal 1 2 3 4 5 6 1 2 -5 -4 -3 6 Translocation 1 2 3 45 6 1 2 65 3 4 Fusion 1 2 3 4 5 6 1 2 3 4 5 6 Fission

  10. Algorithms for sorting genomes Polynomial algorithm for computing the rearrangement distance and the most parsimonious scenario between 2 unichromosomal genomes (Hannenhalli and Pevzner 1995). For example: 1 -6 -3 -7 2 -4 -5 8 1 -6 -3 -2 7 -4 -5 8 1 2 3 6 7 -4 -5 8 1 2 3 4 -7 -6 -5 8 1 2 3 4 5 6 7 8 Further developed for multi-chromosomal genomes (Tesler 2002) and multiple genomes (Bourque and Pevzner 2002).

  11. Chromosome X two way similarities (PatternHunter) synteny bocks (GRIMM-Synteny) rearrangement scenario (MGR)

  12. History of Chromosome X

  13. Mammalian phylogeny pig cat rat mouse human cow dog Murphy et al, Science, 2005

  14. X chromosome evolution

  15. Overview of the Results • Nearly 20% of chromosome breakpoint regions were reused. • Gene-density is higher in evolutionary breakpoint regions. • Segmental duplications populate the majority of primate-specific breakpoints.

  16. Human Chromosome 11

  17. Debate on ancestral reconstructions

  18. Debate on ancestral reconstructions

  19. Recovering true ancestral events • Analyses of genome rearrangements are typically evaluated on: • Quality of the ancestral reconstructions • Ability to recover the correct topology • Total number of rearrangements in the scenario recovered (parsimony) • We decided to focus on the accuracy of the rearrangements recovered • Start by measuring accuracy using simulations and then apply the approach to real data sets • Why? • Look for events that could have been involved in speciation • Look at sequence features associated with these events (e.g. repeats, genes, etc.) • Gain mechanistic insights into genome rearrangements

  20. EMRAE :: Efficient Method to Recover Ancestral Events • Relies on adjacencies conserved in a significant fraction of the genomes. • Combines conserved adjacencies (and nearly conserved adjacencies) to predict rearrangement events. • Applicable to uni and multi-chromosomal genomes. • Currently models: inversions, translocations, fusions, fissions and transpositions. But also amenable to insertions and deletions. • Achieves high specificity with comparable sensitivity.

  21. Conserved adjacencies • Define an adjacencya(ci, ci+1) as an ordered pair of integers ci ci+1 or its inverse -ci+1 -ci found in a given genome. • For a given edge e, if the adjacency a is found in every genome of SA but not in any genome of SB we say that a is a conserved adjacency of SA.

  22. Conserved adjacencies :: example

  23. Simulation results Higher specificity

  24. Mammalian rearrangements events • Predicted 1109 events at a 10Kb resolution: • 831 reversals • 237 transpositions • 15 translocations • 26 fusions/fissions ( reversals, translocations, transpositions, fusions/fissions )

  25. Mammalian rearrangements events • Predicted 1109 events at a 10Kb resolution: • 831 reversals • 237 transpositions • 15 translocations • 26 fusions/fissions ( reversals, translocations, transpositions, fusions/fissions )

  26. Human-chimp-specific reversal

  27. Human-specific breakpoints areenriched in SDs • Human-specific breakpoint regions are significantly enriched in SDs as compared to size-matched random regions (p-value < 0.001). • Indeed, 93.2% of the human-specific breakpoint regions (69 out of 74) contain SDs. • This is true for only approximately 60% of size-matched random regions.

  28. Homologous matching pairs of SDs are enriched in human-specific breakpoints • Taking the 74 human-specific breakpoints identified in this study, we observed 100 pairs of regions with matching pairs of SDs instead of an average of 25 pairs observed in the random simulated data sets.

  29. Primate reversals are associated with SDs • The average percent identity of the SDs that are associated with reversals correlates with the relative age of these events. • This helps confirms the direct link between SDs and many rearrangements events.

  30. If not SDs, what? • Extension from primate specific reversals to all the predicted mammalian reversals • We used BLAST to detect homology between breakpoints of the predicted reversals • Many reversals are flanked by regions of high sequence identity (BLAST score >1000)

  31. Homology flanking mammalian reversals • We found that 58%, 29%, 24%, 42%, 47% and 20% of the human, chimp, rhesus, rat, mouse and dog reversals are supported by regions with Blast scores greater than 1000. • What is the source of this homology? Is it expected? • We restricted our analysis to the reversals with breakpoints defined within 100Kb and assessed the overlap between these regions of homology and repeats. • We annotated each reversal to a particular repeat family when the overlap between the homologous segment identified and a repeat instance was greater than 50% and compared the results to matched simulated data sets.

  32. Overrepresentation of paired L1 repeats

  33. Outline • Genome Rearrangements in Evolution • [ ??? ] • Cancer genomics

  34. Sequencing Revolution • Sanger sequencing (1970s) • Next-Generation sequencing (2007-now) 454 Illumina SOLiD

  35. Data Explosion • Sequencing is no longer the rate limiting step • This year, we expect: • 2X increase in CPU • 2X increase in memory • 10X increase in sequencing (estimate from Illumina and SOLiD) or even 100X increase (Helios, Complete Genomics, etc.) • Informatics challenges that we face now will only grow…

  36. Paradigm Shift • Things that are out: • Storing all primary data (images) • “All versus all” types of analysis • Single large repository (NCBI) • Careless data management (duplicated files, extra transferring steps, etc.) • Things that are in: • Clusters and high performance storage • Cloud computing • Careful data management & planning • Bioinformaticians & IT engineers (even for relatively small labs)

  37. Sequencing Human Genomes 2001 2009 2011 (?) 1000 Genomes Project The Human Genome Your Genome $$$$$$ $$$ $

  38. New opportunities… Evolution In the study of … Populations Cancer

  39. Outline • Genome Rearrangements in Evolution • [ ??? ] • Cancer genomics

  40. Gene Identification Signature Ng, et al., Nature Methods, 2005

  41. PET technology ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Cancer Cell cDNA PET Human Genome

  42. Highly rearranged cancer genome Provided by Nalla Palanisamy, GIS

  43. Translocation Normal Cancer Impact of rearrangements on PETs Inversion Deletion Normal Cancer

  44. GIS-PET MCF-7 Transcriptome 584,624 cDNA equivalents 135,757 Unique PETs 92,928 PETs (69%) 9,732 PETs (7%) 33,097 PETs (24%) One location (tag1) Unmappable (tag0) Multi-location

  45. Sequence-based clustering All unmappable PETs (tag0) Cluster based on sequence similarity Align ---GGAGCCGCGGCCGCC-------ACGATCCCAC-AGCCTC ----GAGCCGCGGCCGCC---AAGAACGATACCAC-AGCCTC ATTGGAGCTGCGGCCGC--------ACGATCCCAC-AGCCTC --TGGAGCCGCGGCCGCCGA-----ACGATCCCAC-AGCCTC ------GCGGCGGCCGCC---AAGAACGATCCCAC-AGCCCC ----GAGCCGCGGCCGCCG---AGCACGATCCCACTAGCCTC 3’ Extract consensus 5’ ATTGGAGCCGCGGCCGCCGA AGAACGATCCCACAGCCTC Map to human genome 5’ 3’

  46. 20q13 17q23 BCAS3 BCAS4 Largest unmappable cluster 5’ 3’ 77 unique PETs 339 total PETs …

  47. BCAS4-3 fusion transcript

  48. Fusion transcript discovery pipeline Ruan et al. Genome Res, 2007

  49. Genomic PET (gPET) Genomic DNA fragmentation PET library construction & sequencing PET sequences mapping to reference genome 1Kb 10Kb PET mapping span 1Kb peak 10Kb peak

More Related