1 / 15

Imputing Supertrees and Supernetworks from Quartets

Imputing Supertrees and Supernetworks from Quartets. By B. Holland, G. Conner, K. Huber, and V. Moulton Presented by Razieh Nokhbeh Zaeem. This talk. Basic problem: constructing an estimate of a species phylogeny (in this case, network) from a given set of gene trees

Download Presentation

Imputing Supertrees and Supernetworks from Quartets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Imputing Supertrees and Supernetworks from Quartets By B. Holland, G. Conner, K. Huber, and V. Moulton Presented by Razieh Nokhbeh Zaeem

  2. This talk • Basic problem: constructing an estimate of a species phylogeny (in this case, network) from a given set of gene trees • Input: a set of partial gene trees (not all taxa) • Output: a supernetwork, allowing the conflicting signals • Algorithm by Holland et al. • combines quartet-imputation with consensus network construction • Experiments comparing the new method to previous method Z-closure and to MRP with respect to “False Positives”, “False Negatives”. • Q-imputation provides a useful complementary tool

  3. Q-imputation • Some definitions: L(T), T|Z, Q(T) and • Let … : collection of input trees corresponding to a collection of gene trees. • Put • For each tree , we sequentially insert all of the taxa in into to get • Once we get all s, we apply consensus network method to obtain a network

  4. Polynomial time alg: For each For each new taxon y: Find a place to add a pendant edge labeled by y We are trying to choose place p s.t. it maximizes the # of agreed quartets between and all other s Choose randomly if there is more than one place to add y to get the best score If the max score is 0 we don’t have enough information

  5. An example – insert F into FB|AD FB|AE FB|DE FA|DE FA|CE FB|AC FB|AE FB|CE FD|BC

  6. The consensus network • The consensus network (the split network): Those splits of X that are displayed by more than a certain proportion, t, of the trees computed by Q-imputation • In case t = 0 we drop the subscript t: splits which appear at least once • For example: • If t = 100, then the consensus network is a strict-consensus tree • If t = 50, then the consensus network is the majority-rule consensus tree • If t < 50, then the consensus network may display conflicting splits

  7. Simulation • Three different types of input: (3 types of simulations) • Evolution is tree like. Gene trees are correct, but miss taxa • Evolution is tree like. Gene trees have errors and miss taxa • Evolution is not tree like. Random input trees. • In each simulation, three parameters were varied: • The species tree, either • The completely balanced tree on 16 taxa or • The completely unbalanced tree on 16 taxa • g taking values 2, 4, 8, 16, and 32 • m (The number of taxa missing) taking values 1, 2, 3, 4, 5, and 6, deleted randomly • One hundred repetitions were carried out for each parameter combination.

  8. Simulation • The split systems generated were: • MRP: and , the splits in the majority-rule consensus and strict consensus from MRP. • Q-imputation: , and • Z-closure: the splits generated using Z-closure • Measuring FP and FN • FP: splits contained in the output split system that are not in the input • FN: splits in input that are not in the output split system

  9. WIP • Definition: weak induction property (WIP): • For input trees … any split S in should restrict to a split in for some • The WIP holds for all splits in in case input trees are all subtrees of a phylogenetic tree. • There are examples where WIP does not hold, although very few generated by Q-imputation. • Z-closure satisfies WIP • Any method with WIP property cannot generate FP: Every split in output has come from some tree in the input set, so there is not split which appears in output but not input. • Q-imputation with t=0 cannot produce FN

  10. Simulation results: FP • Z-closure cannot generate FP, so we just look at splits in Q-imputation and MRP. • 6000 different settings for each type of simulation. • Normalized numbers in parenthesis. • Each tree on 16 taxa, 13 internal edges.

  11. Simulation 1 results: FN, normalized, % Z-closure Q-imoutaion20 MRP50

  12. Simulation 2 results: FN, normalized, % Z-closure Q-imoutaion20 MRP50

  13. Simulation 3 results: FN, normalized, % Z-closure Q-imoutaion20 MRP50

  14. Discussion on simulation results • By increasing the # of gene trees: • FN produced by Z-closure reduces (good) • FN produced by Q-imputation increases (bad) • As a supertree method (simulation 1 & 2), Q-imputation tended to return fewer FP (unsupported) splits, but also fewer supported splits (more FN (?)) than MRP • As a supernetwork method, Q-imputation tended to give rise to FP but not FN(?), whereas Z-closure gave rise to FN but no FP • Also, in simulations where there was an underlying species tree, while increasing number of gene trees: • For Z-closure the number of FN increased (?) • For the split system derived from applying a threshold to the trees completed by Q ‑ imputation, the number of FN had the desirable property of decreasing (?) • For the output to be visually palatable, we need to have some FN to restrict the number of splits that are being displayed. • Q-imputation: a natural means to filter out splits. • Look at case study.

  15. Case study 7 genes, 45 taxa Z-closure Q-imputation

More Related