1 / 24

Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination. Author: Dan Gusfield Presentation by: C. Badri Narayanan . Agenda. Main Problem – Root-Unknown galled-tree problem Solving Optimal Root-Unknown Galled-Tree Problem.

dyllis
Download Presentation

Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination Author: Dan Gusfield Presentation by: C. Badri Narayanan

  2. Agenda • Main Problem – Root-Unknown galled-tree problem • Solving Optimal Root-Unknown Galled-Tree Problem

  3. Root-Unknown Galled-Tree problem Given a set of sequences (say, M), find a galled-tree with minimum number of recombinations, if one exists else output none Let’s see the approach previously taken

  4. Points Considered in Theorem(s) • Only single-crossover recombinations are considered • The algorithm will be extended to multiple crossover recombinations Before seeing the approach let’s consider some definitions

  5. Definition of Terms • Trivial Component: A node with no edges • Component (a.k.a. Connected/Non-Trivial Component): For any pair of nodes there is at least one path between those nodes • Reduced galled-tree: If no gall contains a character site from a trivial component

  6. Previous Approaches – A Roadmap • To construct a galled-tree for M with known ancestral sequence (say, A)  Focus on each non-trivial component separately from incompatibility graph  For each component in the incompatibility graph, determine the site arrangement on a gall  Connect the galls in a tree structure  Place the sites from the trivial components

  7. Difficulties for Unknown Ancestral Sequence • For any two sequences S & S’ (in M), the conflict and incompatibility graphs may be different • How do we know which (ancestral) sequence will allow a galled-tree

  8. Optimal Galled-Tree • If a galled-tree that minimizes the number of recombinations over all galled-trees for a set of sequences (say, M) and over all choices of ancestral sequence then it is called “Optimal Galled-Tree” • The ancestral sequence of an optimal galled-tree is called an “optimal ancestral sequence”

  9. Author’s Approach: Theorem on Galled Trees – Finding An Ancestral Sequence If there is a galled-tree for M with some ancestral sequence, then there is an optimal galled-tree for M where the (optimal) ancestral sequence is one of the sequences in M

  10. Proof for the Theorem  T – optimal galled-tree for M  A – ancestral sequence for T  Every gall must have at least three edges branching off of it

  11. Proof continued….  Path P in T from root to some leaf z which doesn’t contain any recombination nodes • Zz – sequence labeling z where Zz is in M • Make Zz as the ancestral sequence & reverse the directions of all edges on path P

  12. Main Problem contd.. • Each such reversal of edges changes the direction of mutation on edges • The reversal of edges don’t change > Labels on edges in T > Recombination node on a gall • The modified tree T’ also derives M

  13. Main Problem contd.. • Ancestral sequence of T’ is Zz which is a member of M • T’ also contains same number of galls and hence T’ is also optimal • Running time is O(n2 m + n4) where n – number of sequences m – length of binary sequence

  14. Solving Optimal Root-Unknown Galled-Tree Problem • M – can be derived on a galled-tree; T* -an optimal galled-tree for M • A* - an optimal ancestral sequence

  15. Connecting galls of T* Assumptions  Every node v on a gall Q in T* is incident with exactly one edge; The other end is off of Q (a.k.a. “off-edge”)  Off-edge may be directed into or out of a node (say, x)

  16. Connecting Galls of T* • Transform T* to T’ (conceptually) as follows • Node 00100 (say, x) is incident with 2 edges • A new edge (say, y) is introduced • Connect the 2 original edges (that were initially out of x) from y • T’ specifies how galls of T* are connected to each other but does not show the internal arrangement of the sites on any gall

  17. Connecting Galls of T* • If x is root of T* then create a new root and connect it with an edge to x • Contract each gall Q in T* to a single node (say, q) and make all edges undirected

  18. Algorithmic Construction of T’ • Find a family of splits SP(T) • C1 & C2 are obtained from the incompatibility graph • The leaf nodes for the tree (on the right side of the figure) are determined by the sites that have unique combination of characters

  19. Extensions to Complex Biological Phenomena & Structured Recombination • Site-Arrangement algorithm for gall Q corresponding to component C • Let M(C ) be matrix M restricted to sites in C

  20. Extensions to Complex Biological Phenomena & Structured Recombination • For each distinct sequence X in M(C ): • Let M(C, X) be M(C ) after removal of all rows with sequence X • If there is an undirected perfect phylogeny T(C) for M(C,X) where all sites on C are contained in one path whose end sequences can be recombined (with single-crossover) to create sequence X then output the pair (X, T(C ))

  21. Extensions to Complex Biological Phenomena & Structured Recombination • Step 2 of above algorithm is modified for multiple-crossover recombination • To determine if X can be created by a multiple-crossover recombination of Su(C) and Sy(C), starting with Su(C) • Let Su(C) and Sy(C) denote two sequences

  22. Extensions to Complex Biological Phenomena & Structured Recombination • Algorithm: • i = 1; Z = Su(C) • do{ • Find longest substring of Z starting at position i that matches a substring X starting at position i • If none, return no else • Set i to position past the right end of those matching substrings • If Z = Su(C) then set Z = Sy(C) else Z = Su(C) } • Return yes

  23. Extensions to Complex Biological Phenomena & Structured Recombination The above algorithm produces a multiple-crossover galled-tree for M

  24. Thank You

More Related