1 / 40

Eukaryotic Comparative Genomics

Eukaryotic Comparative Genomics. June 2018 GEP Alumni Workshop. Barak Cohen. Charles Darwin. Motoo Kimura. Detecting Conserved Sequences. Evolution of Neutral DNA. A. A. T. C. T. A. A. T. T. G. C. T. G. T. G. A. T. T. C. A. G. A. G. T. A. G. C. A. G. T. G. A.

malindaj
Download Presentation

Eukaryotic Comparative Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eukaryotic Comparative Genomics June 2018 GEP Alumni Workshop Barak Cohen

  2. Charles Darwin Motoo Kimura Detecting Conserved Sequences

  3. Evolution of Neutral DNA A A T C T A A T T G C T G T G A T T C A G A G T A G C A G T G A T A A G T C T T T G A T G T T G T T G C A G G A G T A G T C G T A * * * * * * * * * * * * * * * * * * * * * * * * *

  4. Evolution of Non-Neutral DNA A T C T A G T C C G A T G T G C G T A C C G A C C A T A A G G A T G C A C A C G T A T A C C A T G T G G T A T C C G A T C C A T A A G C A T A T C * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

  5. Multi-Species Alignment ATGTGGCGCAGCCTGTGCCAGCTGGACGATCGA ATGTAGCCTAGCCAGTGCCAGCTGGACGATCGA GTACATCGATAGCTTAGAATGCTGGACGATCTC GTACGTCGATAGCATAGAATGCTGGACGATCTC * * * * ***********

  6. How to do Comparative Genomics • Choose species to analyze • Align sequences • Identify streches of highly conserved nucleotides

  7. Choose species closely related species distantly related species • Closely Related Species • align well • not many changes • Distantly Related Species • hard to align • lots of changes

  8. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  9. Case Study: Coding vs.Non-Coding …TAA ATG…. ORF • Coding DNA • -codes for protein • -triplet code • -open reading frame (ORF) • -tend to be long (50-500 bp) • -highly constrained • Non-Coding DNA • -regulatory functions • -short (5-15 bp) • -degenerate • -variable spacing

  10. CASE 1:Non-Coding ATG… …TAA GAL4

  11. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  12. Closely-related sequences are uninformative ATG… GAL4 paradoxus TCTTCTGAGACAGCATCACTTCTTCTTNTTTTTTACATAACTTATTCTTCTATAATTTTC cerevisiae TCCTTTGAGACAGCATTCGCCCAGTATTTTTTTTATTCTACA-AACCTTCTATAATTT-C ** * *********** * * ******* ** * ************ * paradoxus AACGTATTTACATAGTTCTGTATCAGTTTAATCACCATAATATTGTTTTCCCTCAACTAA cerevisiae AAAGTATTTACATAATTCTGTATCAGTTTAATCACCATAATATCGTTTTCT-----TTGT ** *********** **************************** ****** * paradoxus TGAATGCAATTAGATTTTCTTATTGTTCCCTCGCGGCTTTTTTTTGTTTTATAATCTATT cerevisiae TTAGTGCAATTAATTTTTCCTATTGTTACTTCG-GGCCTTTTTCTGTTTTATGAGCTATT * * ******** ***** ******* * *** *** ***** ******** * ***** paradoxus TTTTCCGTCATTTCTTCCCCAGATTTCCAACTTCATCTCCAGATTGTGTCTATGTAATGC cerevisiae TTTTCCGTCATC-CTTCCCCAGATTTTCAGCTTCATCTCCAGATTGTGTCTACGTAATGC *********** ************* ** ********************** ******* paradoxus ATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGCTACTGTCT cerevisiae ACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGCTACTGTCT * ** ***** ** *** * ** ****** *** ********** ***************

  13. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  14. Distantly-related sequences do not align ATG… GAL4 Noncoding (Promoter) cerevisiae ACTTACCAT-CAAC-CATAGATGGGTAAAC---GGTTAGTAACTAGGAACACGAT castelli AGA-GTCAAACTTTTCGT—ATA--TATATATAATATGTCTGATTGCTGGTT---T * ** * * * * * * * * *

  15. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  16. UAS1 UAS2 UES MIG1 MIG1 Multiple sequence alignments reveal conserved elements ATG… GAL4 cerevisiae TGAGACAGCAT-CACTTCTT-CTTNTTTTTTACATAACTTATTCTTCTATAATTTTCAAC mikatae TGAGACAGCATTCACTTCTTTCTTTTTTTTTACATATCTTATTCTTCTATAATTTTCAAC Bayanus TGAGACAGCATTCGCCCAGT--ATTTTTTTTAT-TCTACAAACCTTCTATAATTT-CAAA kudriadzevi TGAGACTGCACTCCC--------TCTTCCTTTC------------TCCATAACTT---AC ****** *** * * * ** ** ** **** ** * paradoxus GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC kluyveri GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC cerevisiae GTATTTACATAATTCTGTATCAGTTTAATCACCATAAT------ATCGTTTTCTTTGT-- bayanus TTATTTACATAGTTTTGTATCAGTTTAATCACCATAATCGTAACACCGTTTTACCTCACC ********** ** *********************** * ***** * paradoxus TAATGAATGCAATTAGATTTTC-TTATTGTTCCC-TCGCGGCTTTTTTTTGTTTTATAAT kluyveri TAATGAATGCAATTAGATTTTCCTTATTGTTCCCCTCGCGGCTTTTTTTTGTTTTATAAT cerevisiae ---TTAGTGCAATTAATTTTTC-CTATTGTTACT-TCG-GGCCTTTTTCTGTTTTATGAG bayanus TGATGCGGG--A---ATCCTTC-AGACCGTTCTC-TCGCGC------------------- * * * *** * *** *** * paradoxus -CTATTTTTTCCGTCATTTCTTCCCC-AGATTTCCAACTTCAT-CTCCAGATTGTGTCTA kluyveri ACTATTTTTTCCGTCATTTCTTCCCCCAGATTTCCAACTTCATACTCCAGATTGTGTCTA cerevisiae -CTATTTTTTCCGTCATC-CTTCCCC-AGATTTTCAGCTTCAT-CTCCAGATTGTGTCTA bayanus -CTTTTTTTTTCGTCATTTCTTCCCC-AGATCTACAACTTTAA-CTCCAGACGGTGTATA ** ****** ****** ******* **** * ** *** * ******* **** ** paradoxus TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC kluyveri TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC cerevisiae CGTAATGCACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGC bayanus GGCAGTACAAGCAGTGCTTTTGGGAAGAGGCAAAGCTGCAGACCTCGAGAACAATGAAGC * * * ** ** * * ** ** * * ** ** **** *** *******

  17. CASE 2:Coding ATG… …TAA CLN3

  18. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  19. Closely-related sequences are uninformative

  20. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  21. Less distantly related species not informative either

  22. S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S.castellii S. kluyveri Kluyveromyces lactis ~150Mya >350Mya Schizosaccharomyces pombe

  23. Distanly related species reveal functional protein domains

  24. Identification of Multi-Species Conserved Regions (MCS) Human cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct Chimp cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct Mouse ttcagtcgtttcccagtgtctctga-cattcagagactactttagtaagcattt-tctct Rat tcagtccttccctggcatctccag-cactcaa-gactactttagtaagcattt-tctctg Dog tcaatgactttcccagtctcttctactgggaagagattaggttgcaaatcatttttctct * * * * * * ** How can we decide if this region is “conserved?” Margulies et al (2003) Gen. Res. 13:2507-18

  25. Its like flipping coins (really)

  26. Binomial-Based Method for Detecting Conserved Sequences Human: AATGG Mouse: AATCG Status: CCCDC p = probability that a site is the same between human and mouse by chance alone (Kimura), q = 1-p For an alignment N base pairs long with n identities calculate the cumulative binomial probability as: Margulies et al (2003) Gen. Res. 13:2507-18

  27. Large sequencing projects are underway

  28. Tree Topology Influences Power Star Phylogeny Actual Phylogeny species A species F species B species E species C species D

  29. Challenges in larger genomes Deciding on the neutral rate of substitution Local differences in neutral rate of substitutions Multiple hypothesis testing Repeat sequences and uneven base composition

  30. PhastCons and the UCSC Browser Olig2 100 Kb upstream of Olig2

  31. Motif Searching Across Several Multiple Alignments Gene 2 Gene N Gene 1 Gene 3 Species 1 … Species 2 Species 3

  32. Information Content EcoR1 Random Rap1 GAATTC GAATTC GAATTC GAATTC GAATTC GAATTC GAATTC GCCTAC ACATTC TCATTC CGACTC GAATTC ATATCG GAAATG TGTATGGGTG TGTTCGGATT TGCATGGGTG TGTACAGGTG TGTATGGATG TGTTCGGGTT TGTATGGGTG

  33. Weight Matrix Model of TATA Box G. Stormo

  34. Weight Matrix Model of TATA Box Score = -24 ….A CT A T A A T G T … G. Stormo

  35. Weight Matrix Model of TATA Box Score = 43 ….A C T A T A A T G T … G. Stormo

  36. Weight Matrix Model of TATA Box N(b,i) F(b,i) S(b,i) = log[F(b,i)/P(b)] G. Stormo

  37. Now we can compare motifs to each other A A C C G G T T

  38. MAGMAunaligned motif finding in multispecies conserved regions Gene 2 Gene N Gene 1 Gene 3 Species 1 … Species 2 Species 3 *Ihuegbu, Stormo, & Buhler, JCB 19:139, 2012

More Related