1 / 20

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction. Maria Eugenia D’Amato. BCB 705:Biodiversity. Organization of the presentation. Distance ML MP. Phylogenetic reconstruction Networks Multivariate analysis.

leena
Download Presentation

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA variation in Ecology and EvolutionIV- Clustering methods and Phylogenetic reconstruction Maria Eugenia D’Amato BCB 705:Biodiversity

  2. Organization of the presentation Distance ML MP • Phylogenetic reconstruction • Networks • Multivariate analysis

  3. Characters:independent homologous • Continuous • Discrete Binary Multistate

  4. DNA sequence characters Alignment = hypothesizing of a homology relationship for each site Sequence comparison BLAST search - GenBank Coding sequenceblastn blastx Non-coding DNA blastn

  5. Blast search results Score E Sequences producing significant alignments:(Bits) Value gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5...101 3e-18 gi|343991|dbj|D10368.1|FRGMTURF2 Rana catesbeiana mitochondri...97.6 5e-17 gi|14209845|gb|AF314017.1|AF314017 Rana sylvatica NADH dehydr... 93.7 8e-16 The lower the E-value, the better the alignment GeneBank Accession numbers for the sequence Species that match the query

  6. Blast search results >gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenase subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase subunit 2, complete cds Length=10814 Score = 101 bits (51), Expect = 3e-18 Identities = 99/115 (86%), Gaps = 0/115 (0%) Strand=Plus/Minus Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510 ||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| | Sbjct 10203 TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA 10144 Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565 | |||| ||||||||||||||| |||||| |||| || ||||| || |||||||| Sbjct 10143 AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG 10089 Description of the genes contained in the sequence with this Accession number Strands aligned  5’end alignment

  7. Phylogenetic reconstructionDistance methods C1 C2 C3 C4 C5 C6 C7 1 2 3 4 5 5 X 7 Distance criterion 5 x 5 Similarity / dissimilarity criterion dendrogram

  8. Distances criterion for binary data a a + b + c a = bands common to a and b b = bands exclusive to a c = bands exclusive to b J = Jaccard’s distance P1 (x2, y2) Manhattan distance M = Euclidean distance  (x1-x2) 2 + (x2-y2) 2 P2 (x1, y1)

  9. Distance criterion for DNA data-Models of DNA susbstitution fAA fAC fAG fAT fCA fCC fCG fCT fGA fGC fGG fGT fTA fTC fTG fTT Fxy = a b c d e f g h i j k l m n o p Fxy = p = n of different nucleotides/ total n nucleotides

  10. Models of DNA susbstitution 1 1-2P-Q 1 ln 1 4 1-2Q 1 ln 2 + dxy = D = 1 – ( a + f + k + p) Equal rate Jukes and Cantor dxy = - ¾ ln (1- 4/3 D) B = 1 – ( 2A + 2C + 2G + 2T) F81 Unequal base freqs dxy = - B ln (1- D/B) P = c + h + i + nTransitions Q = b + d + e + g + j + l + m + oTransversions K2P

  11. Distances criterion for diploid data I Nei 1972 Jx = xi2 Jx = yi2 Jxy = xiyi Dn -ln Jxiyi  JxiJyi = Cavalli Sforza 1967 Darc =  (1/L)  (2/)2  = cos-1xiyi

  12. Phylogenetic reconstruction criterion for distance data Ultrametric tree (UPGMA) Additive tree (NJ) A C A V1 V1 V4 B V3 V3 V2 V2 V5 D V4 C B Properties Properties dAB = v1 + v2 dAC = v1 + v3 + v4 dAD = v1 + v3 + v5 dBC = v2 + v3 + v5 dCD = v4 + v5 dAB = v1 + v2 + v3 dAC = v1 + v2+ v4 dBC = v3 + v4 v3 = v4 v1 = v2 + v3 = v2 = v4

  13. Maximum Likelihood 3 1 2 4 1 2 3 4 C C C A A A G G G C C C 5 + Prob……. A + Prob Lj = Prob C A A 6 LD = Pr (DH) Tree after rooting at an internal node Unrooted tree 1 J n • C….GGACACGTTTA….C • C….AGACACCTCTA….C • C….GGATAAGTTAA….C • C….GGATAGCCTAG….C L = L1 x L2 x L3…x LN. =  Lj LnL = ln L1+ ln L2 + …. LN = ln Lj

  14. Hypothesis testingLikelihood ratio test Rate variation = log L1 – log L0 Appropriate substitution Model 22 distribution d.f. = N sequences in the tree –2; or d.f = difference number of parameters H1 and H0

  15. BootstrappingHow well supported are the groups? Trumpet fish

  16. Maximum Parsimony Minimize tree length To obtain rooted trees (and character polarity) use an outgroup . The ingroup is monophyletic. Tree (first site) 5 changes 1 change G A • ATATT • ATCGT • GCAGT • GCCGT A G 3 1 A G G A G A 2 A 4 G

  17. Maximum Parsimony-example Site 2 Site 3 T C A A A A C T A A C C T C C C C C Site 5 No changes Site 4 Tree length T G T T L = ki=1li T T G G T G T G

  18. Maximum parsimony:example Sites 1 2 3 4 5 Total Tree ((1,2),(3,4)) 1 1 2 1 0 5 ((1,3),(2,4)) 2 2 1 1 0 6 ((1,4),(2,3)) 2 2 2 1 0 7 Phylogenetically informative sites

  19. Networks • Phylogenetic representation allowing reticulation • More appropriate for intraespecific data • Ancestor is alive • hybridization, recombination, horizontal transfer, polyploidization agct 1 acat agct ac ct 2 3 4 5 7 6 acat acct agct

  20. Multivariate clustering C1 C2 C3 C4 C5 C6 C7 1 2 3 4 5 5 X 7 • Y 2nd axis similarity criterion correlations • Z 3rd axis • • 7 x 7 • X 1st axis Calculate eigenvectors with highest eigenvalues Project data onto new axes (eigenvectors)

More Related