Comp. Genomics

Comp. Genomics Recitation 5 2/4/09 ML in Phylogeny Based on Slides by Ron Shamir and Nir Friedman

Outline • Maximum likelihood (ML) • ML in phylogeny • Ancestral sequence reconstruction using ML

Maximum likelihood • One of the methods for parameter estimation • Likelihood: L=P(Data|Parameters) • Simple example: • Simple coin with P(head)=p • 10 coin tosses • 6 heads, 4 tails • L=P(Data|Params)=(106)p6 (1-p)4

Maximum likelihood • We want to find p that maximizes L=(106)p6 (1-p)4 • Infi 1, Remember? • Log is a monotolical function, we can optimize logL=log[(106)p6 (1-p)4]= log(106)+6logp+4log(1-p)] • Deriving by p we get: 6/p-4/(1-p)=0 • Estimate for p:0.6 (Makes sense?)

Likelihood of a Tree • Input (small problem): • n sequences • A tree T, with labels on the leafs (X) • Find optimal labeled tree: • labeling of internal nodes (Y) • branch lengths (b) • Maximizing the likelihood P(X|T,Y,b)

Likelihood (2) • How to compute P(X|T,Y,b)? • Assumptions: • Each character is independent • The branching is a Markov process: • The probability of a node having a given label is only a function of the parent node and the branch length b between them. • The probabilities P(x|y,t)are known

x5 t5 x4 t3 t2 t1 x2 x1 x3 Example

What if we want P(X|T,b)? Assume that the branch lengths b are known. Independence of sites Markov property independence of each branch

Properties of P • Additivity: • Reversibility • Allows to freely move the root

Efficient Likelihood Calculation (Felsenstein ’73) Use dynamic prog. similar to parsimony Need Sj(v,a) = Pr(subtree rooted in v | vj = x) Initialization: For each leaf v set Sj(v,a) = 1 if i is labeled by a, otherwise Sj(v,a) = 0 Recursion: Traverse the tree in postorder: for each node v with children u and w, for each state x Complexity: O(nmk2) n species, m chars, k states

Ancestral sequence reconstruction • Input: • Rooted tree + extant (leaf) sequences • Substitution matrix + branch lengths • Problem: Find the sequence assignment of internal states which maximizes the total tree likelihood

Solving ancestral sequence reconstruction • Simple with parsimony methods, ≈ through the Fitch/Sankoff algorithms • Here, we’re interested in ML • Maximizing • P(ancestral S|contemporary S) • Joint vs. Marginal • Marginal: focus on a single node (e.g., the root), and maximize its likelihood • Joint: Infer all the sequences together

Solutions • We can enumerate all the possible ancestral states and check their likelihood… • cnpossible combinations per character • n – number of internal nodes • Inapplicable when the tree is large • Koshi and Goldshtein (1996) – fast algorithm for marginal reconstruction • Pupko, Pe’er, Shamir and Graur (2004): fast algorithm for joint reconstruction

Basics • We assume different sites evolve independently • Working one site at a time • Pij(t) – the probability of observing ij in time t • We want to maximize • P(v|data)=P(data|v)*P(v)/P(Data) Constant!

DP to the rescue • DP often suitable for tree problems • Idea: • Start from the leaves and climb up the tree • The subtree under every node is dependent only on the state of its parent! • For node x computeLx(i) and Cx(i) • Lx(i) – the likelihood of x’s subtree under the condition that its parent is assigned with i • Cx(i) – the state of x that gives rise to this likelihood

Algorithm phase I • Initialization: • For a leaf y assigned with j: Cy(i)=j, Ly(i)=Pij(t) • Progression: • For an internal node zwith sons x,y already visited: for each i we compute • Termination: • For the root with sons x,y,z – choose k maximizing

Algorithm phase II • “Traceback” • Traverse the tree from root to the leaves • For every internal node x with father y already reconstructed with i • Reconstruct the state in x by setting Cx(i) • Continue until all the nodes are reconstructed

Example

Complexity • For n internal nodes and c possible states we compute a DP table of O(nc) cells. • As we maximize in every cell over c states, time is O(nc2) • As c is constant – O(n)

Comp. Genomics

Comp. Genomics

Presentation Transcript

DNA Chips and Their Analysis Comp. Genomics: Lecture 13

Genomics

Genomics

Computational Genomics Fall 2004/5 www.cs.tau.ac.il/~bchor/CG05/comp-genom.html

Comp. Genomics

Comp. Genomics

Comp. Genomics

Genomics

Comp. Genomics

Intro to Comp Genomics

Comp. Genomics

Comp. Genomics

Comp. Genomics

Genomics

Comp. Genomics

Genomics

Comp. Genomics

Computational Genomics Spring 2009 cs.tau.ac.il/~bchor/CG09/comp-genom.html

Genomics

Comp. Genomics

Genomics

Comp. Genomics