1 / 119

Grupo de Genómica Evolutiva

Introducción a la reconstrucción filogenética. Grupo de Genómica Evolutiva. Dr. Luis José Delaye Arredondo ldelaye@ira.cinvestav.mx. Departamento de Ingeniería Genética. Parte 1: Árboles. N. N. ¡Hipótesis!. Cladogram. Phylogeny. A. B. C. A. B. C. Additive tree. A. B. C.

nhu
Download Presentation

Grupo de Genómica Evolutiva

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducción a la reconstrucciónfilogenética • Grupo de GenómicaEvolutiva Dr. Luis José DelayeArredondo ldelaye@ira.cinvestav.mx Departamento de IngenieríaGenética

  2. Parte 1: Árboles N N

  3. ¡Hipótesis!

  4. Cladogram Phylogeny A B C A B C Additive tree A B C time Ultrametric tree Character change A B C 0 5 A phylogeny and the three basic kinds of tree used to depict that phylogeny After Page and Holmes (1998)

  5. raíz longitud de ramas A B topología = relaciones filogenéticas

  6. B A C OTUs A B C raíz árbol sin raíz árbol con raíz

  7. B A C grupo interno grupo externo A B C evidencia independiente de divergencia previa

  8. t B(t) =  (2i – 5) i = 3

  9. A C B A B C =

  10. = B A A B

  11. D A B C A B C A B C parafilético polifilético monofilético

  12. Parte 2: Estimando el cambioevolutivo N N

  13. Divergence Between DNA sequences

  14. Divergence Between DNA sequences Ancestral sequence ACTGAACGTAACGC ACTGAACGTAACGC t t Single substitution A Multiple substitutions T C G G Coincidental substitutions Parallel substitutions A A T C Convergent substitutions T Back substitutions T C Sequence 1 Sequence 2

  15. Divergence Between DNA sequences Ancestral sequence ACTGAACGAATCGC ACTGAACGAATCGC t t Single substitution A Multiple substitutions T C A A Coincidental substitutions Parallel substitutions C C Although there has been 12 mutations, only 2can be detected T C Convergent substitutions T Back substitutions T C Sequence 1 Sequence 2

  16. Proporción de diferencias reales saturación 1 Disimilitud de secuencias D = (1 – I(t)) Proporción de diferencias observadas 0 D(t)  D(t+1) tiempo

  17. Models of sequence evolution can be used to “correct” for multiple hits Distance correction 1 Sequence dissimilarity D = (1 – I(t)) 0 Time

  18. Estimating the number of nucleotide substitutions using the Poisson Correction for protein sequences

  19. Estimating the number of nucleotide substitutions using the Poisson Correction for protein sequences P (k) = e -rt (rt)k/ k! M C A N T P L … P (substitutions) P (0) = e –rt(rt)0/0! = e –rt P (1) = e -rt(rt)1/1! P (2) = e -rt (rt)2/2! P (n) = e -rt (rt)n/n!

  20. SecA e–rt e–rt Sec1 Sec2 Estimating the number of nucleotide substitutions using the Poisson Correction for protein sequences K = 2rt The probability that none of the sequences has suffered a substitution is: q = (e–rt)2 e–2rt= 1 - p Doing a little algebra: e–K= 1 - p K = - ln (1 - p)

  21. Genetic distance using Poisson Correction

  22. The Jukes and Cantor’s One-Parameter Model  A G     C T  fA= fT= fC= fG

  23. The Kimura two-Parameter Model  A G Transitions     Transversions Transitions C T  fA= fT= fC= fG

  24. The Jukes and Cantor’s One-Parameter Model 100 Transitions 80 60 Base pair differences 40 20 Transversions 0 5 10 15 20 25 Time since divergence (Myr) Number of transition and transversions between pairs of bovid mammal mitochondrial sequences (684 base pairs from the COII gene) against the estimated time of divergence.

  25. General Time Reversible Model PAG A G PAT PAT PAC PGT C T PCT fAfTfCfG

  26. Models of DNA evolution using matrix theory Substitution probability matrix PAA PAC PAG PAT PCA PCC PCG PCT Pt = PGA PGC PGG PGT PTA PTC PTG PTT Base composition of sequences f = [fAfCfGfT]

  27. The Kimura two-Parameter Model Substitution probability matrix *   *  * pii = 1 - jipij Pt =  *   * Base composition of sequences f = [ ¼ ¼ ¼ ¼ ]

  28. The Felsenstein (1981) Model Substitution probability matrix * C G T  A  * G T  * pii = 1 - jipij Pt = A C  * T  A C G  * This model assumes that there is variation in base composition Base composition of sequences f = [ACGT ]

  29. The Hasegawa, Kishino and Yano (1985) Model Substitution probability matrix * C G T  A  * G T  * pii = 1 - jipij Pt = A C  * T  This model assumes that there is variation in base composition and that transition and transversions occur at different rates. A C G  * Base composition of sequences f = [ACGT ]

  30. The General Reversible (REV) Model Substitution probability matrix * C aG bT c A a * G dT e * pii = 1 - jipij Pt = A bC d * T f A cC eG f * This model assumes that there is variation in base composition and that each substitution has its own probability. Base composition of sequences f = [ACGT ]

  31. Comparing the Models Jukes-Cantor Allow for / bias Allow for base frequency to vary Kimura 2 parameter Felsenstein (1981) Allow for base frequency to vary Allow for / bias HKY (1985) Allow all six pairs of substitutions to have different rates General Reversible (REV) From Page and Holms (1998)

  32. Among site rate variation For protein coding sequences not all sites have the same probability of change (there is among site rate variation). If this effect is not taken into account, the number of substitutions per site between two sequences can be underestimated (Li and Graur, 1991).

  33. Gamma distribution f(r) = [ba / (a)] e–br r a-1 where: (a) = ∫0e–tta-1 dt (Neiand Kumar, 2000)

  34. The a shape parameter (Neiand Kumar, 2000)

  35. Effect of among site rate variation in sequence divergence (A) Substitution rate of 0.5 % / M.a. and 80 % of the sites free to vary (B) Substitution rate of 2 % / M.a. and 50 % of the sites free to vary (Page and Holms, 1998)

  36. Parte 3: Métodos de inferenciafilogenética N N

  37. Tipo de dato Sitios (nucleótidos o aminoácidos) Matríz de distancia UPGMA Clustering Neighbor joining Métodoparaconstruir la filogenia Máximaparsimonia Criterio de optimización MétodosBayesiandos Evolución mínima Máximaverosimilitud After Page and Holmes (1998)

  38. Método de inferenciafilogenética Reconstrucciónfilogenética Alineación múltiple

  39. Método de inferenciafilogenética Reconstrucciónfilogenética Alineaciónmúltiple Suposicionessobreelprocesoevolutivo

  40. OTU OTU A B C B dAB C dAC dBC D dAD dBD dCD Métodos de Distancia c a ATGCGCACGT sec 1 ACACGTACGT sec 2 ATGCGAACCT sec 3 ATACGTACGT sec 4 d b * ** ** * Reconstrucciónfilogenética Alineaciónmúltiple La distancia evolutiva es proporcional al número de diferencias

  41. Tipo de dato Sitios (nucleótidos o aminoácidos) Matríz de distancia UPGMA Clustering Neighbor joining Métodoparaconstruir la filogenia Máximaparsimonia Criterio de optimización MétodosBayesiandos Evolución mínima Máximaverosimilitud After Page and Holmes (1998)

  42. OTU OTU A B C B dAB C dAC dBC D dAD dBD dCD Unweighted Pair-group Method using Arithmetic averages (UPGMA)

  43. OTU OTU A B C B dAB C dAC dBC D dAD dBD dCD Unweighted Pair-group Method using Arithmetic averages (UPGMA)

  44. Unweighted Pair-group Method using Arithmetic averages (UPGMA) A B dAB /2

  45. OTU OTU (AB) C C d(AB)C D d(AB)D dCD Unweighted Pair-group Method using Arithmetic averages (UPGMA) d(AB)C = ( dAC + dBC )/2 d(AB)D = ( dAD + dBD )/2

  46. OTU OTU (AB) C C d(AB)C D d(AB)D dCD Unweighted Pair-group Method using Arithmetic averages (UPGMA)

  47. Unweighted Pair-group Method using Arithmetic averages (UPGMA) A B C d(AB)C /2

  48. Unweighted Pair-group Method using Arithmetic averages (UPGMA) A B C d(AB)C /2

  49. Unweighted Pair-group Method using Arithmetic averages (UPGMA) A B C D d(ABC)D /2 = [(dAD + dBD + dCD )/ 3]/ 2

  50. Unweighted Pair-group Method using Arithmetic averages (UPGMA) dXY = dij / (nX nY) Assumes a constant molecular clock Estimates tree topology and branch length

More Related