190 likes | 213 Views
Explore building phylogenies using maximum likelihood, distance-based, and parsimony methods in molecular evolution. Learn about hidden ancestral states, transition probabilities, and ancestral reconstruction. Discover how to analyze sites and resolve conflicts in gene duplication scenarios.
E N D
Building Phylogenies Maximum Likelihood
Methods • Distance-based • Parsimony • Maximum likelihood
Methods • Distance-based • Parsimony • Maximum likelihood
ML is based on a Markov model of evolution • Observed: The species labeling the leaves • Hidden: The ancestral states • Transition probabilities: The mutation probabilities • Assumptions: • Only mutations are allowed • Sites are independent
Models of evolution at a site • Transition probability matrix:M = [mij], i, j{A, C, T, G}where mij = Prob(ij mutation in 1 time unit) • Branches may have different lengths
The probability of an assignment T G T A G C T Probability = mTG ·mGA · mGG · mTT · mTC · mTT
Ancestral reconstruction: most likely assignment X Y Z A G C T L* = maxX,Y,Z {mXY · mYA · mYG· mXZ· mZC· mZT} Compute using Viterbi algorithm
Likelihood of a tree X Y Z A G C T L* = X,Y,Z {mXY · mYA · mYG· mXZ· mZC· mZT} Compute using forward algorithm
Analysis for all sites Use enumeration (exhaustive, branch and bound, branch swapping, etc.) to find ML tree
Comments • ML is robust • ML converges to correct answer as more data is added • Can put in a Bayesian statistical framework, to obtain a distribution of possible phylogenies • ML can be slow
Issues • Complicating factors: • Gene duplication • Horizontal gene transfer: Exchange of genetic material between species • Chimeric genes • Evolution may not be described by a tree, but by a network
Gene Duplication 1 2 5’ 3’ human -globin
Homology, orthology, and paralogy • Homology:Similarity attributed to descent from a common ancestor. • Orthologous sequences: Homologous sequences in different species that arose from a common ancestral gene during speciation • May or may not be responsible for a similar function • Paralogous sequences: Homologous sequences within a single species that arose by gene duplication.
Orthology and Paralogy http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html
A A B C C B Conflicts between genes y species? Species Genes
A B C B A A B C C Resolving the conflict Problem: Resolve conflicts using the minimum number of duplications