1 / 29

Perfect Phylogeny MLE for Phylogeny Lecture 14

Perfect Phylogeny MLE for Phylogeny Lecture 14. Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1. Final Exam Details. The Final Exam will take Place on Thursday, 3.2.04, 0900, at Taub 4.

jola
Download Presentation

Perfect Phylogeny MLE for Phylogeny Lecture 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1 .

  2. Final Exam Details The Final Exam will take Place on Thursday, 3.2.04, 0900, at Taub 4. Allowed Material: Course&Tutorial slides+ the textbooks of the course (Durbin et el, Setubal&Meidanis, Gusfield).

  3. 2. The perfect phylogeny problem • A character is assumed to be a property which distinguishes between species (e.g. dental structure). • A characters state is a value of the character (human dental structure). • Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

  4. Characters as Colorings A coloring of a tree T=(V,E) is a mapping C:V [set of colors] A partial coloring of T is a mapping defined on a subset of the vertices U  V: C:U [set of colors] U=

  5. Each character defines a (partial) coloring of the correspondeing phylogenetic tree: Characters as Colorings (2) Species ≡ VerticesStates ≡ Colors

  6. Convex Colorings (and Characters) Let T=(V,E) be a partially colored tree, and d be a color. The d-carrier is the minimal subtree of T containing all vertices colored d Definition: A (partial/total) coloring of a tree is convex iff its d-carriers are mutually disjoint C

  7. Convexity  Homoplasy Freedom A character is Homoplasy free (avoids reversal and convergence transitions) ↕ The corresponding (partial) coloring is convex

  8. The Perfect Phylogeny Problem • Input: a set of species, and many characters. • Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex? (always possible for one chracter)

  9. RRB BBR RRR RBR The Perfect Phylogeny Problem(pure graph theoretic setting) Input: Partial colorings (C1,…,Ck) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors). Problem: Is there a tree T=(V,E), s.t. UV and for i=1,…,k,, Ci is a convex (partial) coloring of T? NP-Hard In general, in P for some special cases

  10. Perfect Phylogeny for a 0-1 Matrix Rows correspond to objects, columns to characters. Each character has two states: 0 (non exists) or 1 (exists). A tree T is a perfect phylogeny for the matrix iff it has the following properties: • Each of the n objects corresponds to a leaf of T. • Each of the m characters labels exactly one edge of T. • Object p has character ii labels an edge on thepath from p to the root. Note: [B and C]  [each character is convex on T] C2 C3 C1 C4 E B D C5 A C

  11. Perfect Phylogeny for a 0-1 Matrix By the definition, for each character C there is one edge in which it is converted from 0 to 1. In the below tree, the edge on which character C2 is converted to 1 is marked. The resulted tree is convex for this character. C2 E B D A C

  12. C2 C3 C1 C4 E D B C5 A C The (Binary) Perfect Phylogeny Problem Problem: Given a 0-1 matrix M, determine if it has a perfect phylogeny in which the root has 0 for all characters, and construct one if it does. (Note: edges are labeled by characters: edge labeled by i represent changing character i’sstate from 0 to 1). As we show below, the answer is yes for our matrix:

  13. Efficient algorithm for the Binary Perfect Phylogeny Problem Definition: Given a 0-1 matrix M, Ok={j:Mjk=1}, ie: Ok is the set of objects that have character Ck. Theorem: M has a perfect phylogenetic tree iff the sets {Oi} are laminar, ie: for all i, j, either Oi and Oj are disjoint, or one includes the other. Laminar Not Laminar

  14. Proof : Assume M has a perfect phylogeny, and let Ci, Cj be given. Consider the edges labeled Ci and Cj. Case 1: There is a root to leaf path containing both edges. Then one is included in the other (C2 and C1 below). Case 2: not case 1. Then they are disjoint (C2 and C3). C2 C3 C1 C4 E D B C5 A C

  15. C1 B A Proof (cont.) : Assume for all i, j, either Oi and Oj are disjoint, or one includes the other. We prove by induction on the number of characters that M has a perfect phylogenetic tree for the matrix. Basis: one character. Then there are at most two objects, one with and one without this character.

  16. Proof (cont.) : Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O1 is not contained in Oj for j > 1. Let S1 be the set of objects j for which Mj1= 1, and S2 be the remaining objects. Then each character belongs to objects in S1 or S2, but not both (prove!). By induction there are trees T1 andT2 for S1 and S2. Combining them as below gives the desired tree. 1 T1 T2

  17. Efficient Implementation 1 Sort the columns (characters) by decreasing value when considered as binary numbers. (Time complexity: O(mn), using radix sort). Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj. Proof: Oi – Oj > 0 means the 1’s in Oi are not covered by the 1’s in Oj.

  18. Efficient Implementation(2) 2. Make a backwards linked list of the 1’s in each row (leftmost 1 in each row points at itself). Time complexity: O(mn). Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column. Can be checked in O(mn) time.

  19. Examples Not laminar laminar

  20. Efficient Implementation(3) 3. When the matrix is laminar, the tree edges corresponding to characters are defined by the backwards links in the matrix. remaining edges and leaves are determined by the characters of each object. Needs O(mn) time. C2 C3 C1 C4 E D B C5 A C

  21. A scenario where Maximum Parsimony (and Perfect Phylogeny) are misleading Consider a model with 4 letters (DNA), where the probability for a substitution is proportional to time. 1 4 In the following topology, 2 and 3 are likely to be like the origin, but 4 and 5 can be different. In this case, Maximum Parsimony is misleading. A A 2 3 A

  22. A G G G IV Misinformative 1 4 A A 2 3 A Parsimony may be useless/misleading A I Uninformative II Uninformative A III Uninformative C G For leaves 1,4 there are 4 combinations of substitution. In the first three, all three topologies will obtain the same parsimony score. In the fourth, a wrong topology will score best

  23. 1 1 1 2 A 2 A A 3 A A A 3 2 4 4 3 4 A A A A A A Parsimony may be UselessCase I A A 1 4 A A 2 3 A Score=0 Score=0 Score=0

  24. 1 1 1 2 A 2 A A 3 A A A 3 2 4 4 3 4 A A G G G A Parsimony Imay be uselessCase II A G 1 4 A A 2 3 A Score=1 Score=1 Score=1

  25. 1 1 1 2 A 2 A A 3 C C C 3 2 4 4 3 4 A A G G G A Parsimony may be misleadingCase III C G 1 4 A A 2 3 A Score=2 Score=2 Score=2

  26. 1 1 1 2 A 2 A A 3 C C C 3 2 4 4 3 4 A A C C C A Parsimony may be misleadingCase III C C 1 4 A A 2 3 A Score=1 Score=2 Score=2

  27. C A 1 4 C A 1 4 C A 2 3 C A C A 2 3 A A A A Parsimony may be misleading Will infer correctly only in the rare case of a change on the central edge, or In an even more rare case of a parallel change from A to C on the pendant edges to 1 and 2.

  28. AAA AAA AGA AAA AGA AAG GGA 3. Maximum Likelihood Approach Consider the phylogenetic tree to be a stochastic process. The likelihood of transition from character a to charcter b is given by parameters b|a . The liklihood of a letter a in the root is qa. Given the complete tree, its probability is defined by the values of the b|a ‘s and the qa’s.

  29. AAA AGA AAG GGA Maximum Likelihood Approach(2) When the data consists only of the leaves sequences (but the topology is fixed): Write down the likelihood of the data (leaves sequences) given the tree. Use EM to estimate the b|a parameters. When the tree is not given: Search for the tree that maximizes Prob(data|Tree, EM)

More Related