160 likes | 178 Views
Research on optimizing Steiner tree problems for faster and near-perfect phylogenetic tree reconstruction, focusing on genotype data and evolutionary relationships.
 
                
                E N D
A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees Srinath Sridhar Joint work with: Kedar Dhamdhere, Guy E. Blelloch, Eran Halperin, R. Ravi and Russell Schwartz
Steiner Tree Problem • Input: Graph G(V, E) with edge weights w: ER and a ‘terminal’ set SV • Output: Subtree T of G connecting all vertices in S • Objective: Minimize |w(T)| • Informally: MST with intermediate vertices • NP-complete, even if G is m-dimensional hypercube with unit edge weights
Near-Perfect Phylogenetic Trees • Input: set S of n points on an m-dimensional hypercube (n bit-strings of length m) • Output: Steiner (unrooted) tree T connecting S using intermediate nodes (Steiner nodes) of hypercube • Objective: Minimize |T| • Assumption: |Topt| m + q, constant q
Typical Input Data • Rows: different species, languages etc • Columns: yes/no, 0/1 properties of rows • Phenotypes: Each column can represent binary questions: thumbs? color-blind? • DNA: Each position has 2 possibilities (almost always)
Example 0001 Boggart H W RS B/NB • Basilisk: 1 1 0 0 • Boggart: 0 0 0 1 • Centaur: 1 0 1 1 • Goblin: 1 0 0 1 H: Head W: Wings RS: Can read stars B/NB: Bad/not-so-bad 1001Goblin 1000 Steiner 1011 Centaur Basilisk 1100
Perfectness 0001 Boggart 1 1001Goblin • Annotate tree T with the column flip • Tree T ‘perfect’: annotations occur only once • Evolution is assumed to be (nearly) perfect 4 3 1000 Steiner 1011 Centaur 2 Basilisk 1100
Perfectness 0001 Boggart 1 1001Goblin • Annotate tree T with the column flip • Tree T ‘perfect’: annotations occur only once • Evolution is assumed to be (nearly) perfect • q-near-perfect: |Topt| m + q, constant q 4 3 1000 Dementor 1011 Centaur 2 Basilisk 1100 1-near perfect 4 Hippogriff 1101
General Phylogeny Problem • Input S: set of n strings in {1, …, k}m • Output: Steiner tree T connecting all of S (Hamming distance) • Objective: Minimize |T| • Variants: • k is bounded by a constant, k is 2 • Tree T is perfect • Tree T is near-perfect
Overview Discover O(q) edges, induced topology Optimal Tree
Overview Discover assignment of rows to super nodes Optimal Tree
Overview Grow perfect phylogeny within Each super node Optimal Tree
Overview Link the super nodes Optimal Tree
Current/Future Work • Simpler algorithm • States k > 2, near-perfect • Experimental evaluation, useable code • Related harder problem: Input is ‘mixture’ of 2 strings over {0, 1}mInput:2 0 1 1 2Output:1 0 1 1 0 0 0 1 1 1