1 / 19

Comp. Genomics

Comp. Genomics. Recitation 5 2/4/09 ML in Phylogeny. Based on Slides by Ron Shamir and Nir Friedman. Outline. Maximum likelihood (ML) ML in phylogeny Ancestral sequence reconstruction using ML. Maximum likelihood. One of the methods for parameter estimation

pearl
Download Presentation

Comp. Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comp. Genomics Recitation 5 2/4/09 ML in Phylogeny Based on Slides by Ron Shamir and Nir Friedman

  2. Outline • Maximum likelihood (ML) • ML in phylogeny • Ancestral sequence reconstruction using ML

  3. Maximum likelihood • One of the methods for parameter estimation • Likelihood: L=P(Data|Parameters) • Simple example: • Simple coin with P(head)=p • 10 coin tosses • 6 heads, 4 tails • L=P(Data|Params)=(106)p6 (1-p)4

  4. Maximum likelihood • We want to find p that maximizes L=(106)p6 (1-p)4 • Infi 1, Remember? • Log is a monotolical function, we can optimize logL=log[(106)p6 (1-p)4]= log(106)+6logp+4log(1-p)] • Deriving by p we get: 6/p-4/(1-p)=0 • Estimate for p:0.6 (Makes sense?)

  5. Likelihood of a Tree • Input (small problem): • n sequences • A tree T, with labels on the leafs (X) • Find optimal labeled tree: • labeling of internal nodes (Y) • branch lengths (b) • Maximizing the likelihood P(X|T,Y,b)

  6. Likelihood (2) • How to compute P(X|T,Y,b)? • Assumptions: • Each character is independent • The branching is a Markov process: • The probability of a node having a given label is only a function of the parent node and the branch length b between them. • The probabilities P(x|y,t)are known

  7. x5 t5 x4 t3 t2 t1 x2 x1 x3 Example

  8. What if we want P(X|T,b)? Assume that the branch lengths b are known. Independence of sites Markov property independence of each branch

  9. Properties of P • Additivity: • Reversibility • Allows to freely move the root

  10. Efficient Likelihood Calculation (Felsenstein ’73) Use dynamic prog. similar to parsimony Need Sj(v,a) = Pr(subtree rooted in v | vj = x) Initialization: For each leaf v set Sj(v,a) = 1 if i is labeled by a, otherwise Sj(v,a) = 0 Recursion: Traverse the tree in postorder: for each node v with children u and w, for each state x Complexity: O(nmk2) n species, m chars, k states

  11. Ancestral sequence reconstruction • Input: • Rooted tree + extant (leaf) sequences • Substitution matrix + branch lengths • Problem: Find the sequence assignment of internal states which maximizes the total tree likelihood

  12. Solving ancestral sequence reconstruction • Simple with parsimony methods, ≈ through the Fitch/Sankoff algorithms • Here, we’re interested in ML • Maximizing • P(ancestral S|contemporary S) • Joint vs. Marginal • Marginal: focus on a single node (e.g., the root), and maximize its likelihood • Joint: Infer all the sequences together

  13. Solutions • We can enumerate all the possible ancestral states and check their likelihood… • cnpossible combinations per character • n – number of internal nodes • Inapplicable when the tree is large • Koshi and Goldshtein (1996) – fast algorithm for marginal reconstruction • Pupko, Pe’er, Shamir and Graur (2004): fast algorithm for joint reconstruction

  14. Basics • We assume different sites evolve independently • Working one site at a time • Pij(t) – the probability of observing ij in time t • We want to maximize • P(v|data)=P(data|v)*P(v)/P(Data) Constant!

  15. DP to the rescue • DP often suitable for tree problems • Idea: • Start from the leaves and climb up the tree • The subtree under every node is dependent only on the state of its parent! • For node x computeLx(i) and Cx(i) • Lx(i) – the likelihood of x’s subtree under the condition that its parent is assigned with i • Cx(i) – the state of x that gives rise to this likelihood

  16. Algorithm phase I • Initialization: • For a leaf y assigned with j: Cy(i)=j, Ly(i)=Pij(t) • Progression: • For an internal node zwith sons x,y already visited: for each i we compute • Termination: • For the root with sons x,y,z – choose k maximizing

  17. Algorithm phase II • “Traceback” • Traverse the tree from root to the leaves • For every internal node x with father y already reconstructed with i • Reconstruct the state in x by setting Cx(i) • Continue until all the nodes are reconstructed

  18. Example

  19. Complexity • For n internal nodes and c possible states we compute a DP table of O(nc) cells. • As we maximize in every cell over c states, time is O(nc2) • As c is constant – O(n)

More Related