1 / 16

21 December 2007

21 December 2007. Coalescent Consequences for Consensus Cladograms. J. H. Degnan 1 , M. Degiorgio 2 , D. Bryant 3 , and N. A. Rosenberg 1,2 1 Dept. of Human Genetics, U. of Michigan 2 Bioinformatics Program, U. of Michigan 3 Dept. of Mathematics, U. of Auckland. Outline .

kim-park
Download Presentation

21 December 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 21 December 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan1, M. Degiorgio2, D. Bryant3, and N. A. Rosenberg1,2 1 Dept. of Human Genetics, U. of Michigan 2 Bioinformatics Program, U. of Michigan 3 Dept. of Mathematics, U. of Auckland

  2. Outline • Species trees vs. gene trees • Consensus tree background • Asymptotic consensus trees • Finite sample consensus trees • Consistency results • Conclusions

  3. Gene trees vary across the genome

  4. Why? Incomplete lineage sorting, horizontal gene transfer, sampling, etc.

  5. Gene tree discordance • From one true species tree, we expect there to be different gene trees at different loci as a result of lineage sorting, independently of problems due to estimation or sampling error. • Gene tree discordance depends especially on branch lengths in the species tree, measured by the number of generations scaled by effective population size, t / N.

  6. Consensus (majority-rule)

  7. Types of consensus trees • Strict—only clades that are included in observed trees are in the consensus tree. In the coalescent model, all clades have probability > 0. • Democratic vote—use the gene tree that occurs most frequently. • Majority rule—consensus tree has all clades that were observed in > 50% of trees. • Greedy—sort clades by their proportions. Accept the most frequently observed clades one at a time that are compatible with already accepted clades. Do this until you have a fully resolved tree. • R*—for each set of 3 taxa, find the most commonly occurring triple e.g., (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring triples.

  8. Asymptotic consensus trees • Consensus trees are usually statistics, functions of data like x-bar. • We consider replacing observed (estimated) gene trees with their theoretical probabilities under coalescence and determining the resulting consensus tree. • Motivation: if there are a large number of independent loci, observed gene tree and clade proportions should approximate their theoretical probabilities.

  9. Tree/Clade Probability Examples x = y = 0.1 x = y = 0.05 ((AB)(CD)) p1 0.128 0.121 ((AC)(BD)) p2 0.099 0.105 ((AD)(BC)) p3 0.099 0.105 (((AB)C)D) p4 0.104 0.079 (((AB)D)C) p5 0.091 0.075 (((AC)B)D) p6 0.066 0.061 (((AC)D)B) p7 0.062 0.060 (((AD)B)C) p8 0.037 0.045 (((AD)C)B) p9 0.037 0.045 (((BC)A)D) p10 0.066 0.061 (((BC)D)A) p11 0.062 0.060 (((BD)A)C) p12 0.037 0.045 (((BD)C)A) p13 0.037 0.045 (((CD)A)B) p14 0.037 0.045 (((CD)B)A) p15 0.037 0.045 {AB} p1 + p4 + p5 0.332 (1) 0.275 (1) {AC} p2 + p6 + p7 0.227 (2) 0.226 (2) {AD} p3 + p8 + p9 0.173 (6) 0.189 (7) {BC} p3 + p10 + p11 0.226 (3) 0.226 (2) {BD} p2 + p12 + p13 0.173 (6) 0.195 (6) {CD} p1 + p14 + p15 0.202 (5) 0.211 (4) {ABC} p4 + p10 + p14 0.215 (4) 0.201 (5) {ABD} p5 + p8 + p12 0.165 (8) 0.165 (8) {ACD} p7 + p9 + p14 0.136 (9) 0.150 (9) {BCD} p11 + p13 + p15 0.136 (9) 0.150 (9) Greedy Tree (((AB)C)D) ((AB)(CD))

  10. Tree/Triple Probability Examples x = y = 0.1 x = y = 0.05 ((AB)(CD)) p1 0.128 0.121 ((AC)(BD)) p2 0.099 0.105 ((AD)(BC)) p3 0.099 0.105 (((AB)C)D) p4 0.104 0.079 (((AB)D)C) p5 0.091 0.075 (((AC)B)D) p6 0.066 0.061 (((AC)D)B) p7 0.062 0.060 (((AD)B)C) p8 0.037 0.045 (((AD)C)B) p9 0.037 0.045 (((BC)A)D) p10 0.066 0.061 (((BC)D)A) p11 0.062 0.060 (((BD)A)C) p12 0.037 0.045 (((BD)C)A) p13 0.037 0.045 (((CD)A)B) p14 0.037 0.045 (((CD)B)A) p15 0.037 0.045 (AB)C* p1 + p4 + p5 + p8 + p12 0.397 0.365 (AC)B p2 + p6 + p7 + p9 + p14 0.301 0.316 (AB)D* p1 + p4 + p5 + p6 + p10 0.455 0.397 (AD)B p3 + p7 + p8 + p9 + p14 0.272 0.391 (AC)D* p2 + p4 + p6 + p7 + p10 0.397 0.366 (AD)C p3 + p5 + p8 + p9 + p12 0.301 0.315 (BC)D* p3 + p4 + p6 + p10 + p11 0.397 0.366 (BD)C p2 + p5 + p8 + p12 + p13 0.301 0.315 R* Tree (((AB)C)D) (((AB)C)D)

  11. Unresolved zone for majority-rule and too-greedy zone

  12. What about finite samples? • If you sample 10 loci, you could have: • All 10 match the species tree • 9 match the species tree, 1 disagrees • 8 match the species tree, 2 disagree, etc. • You can consider gene trees as categories and use multinomialprobabilities for the probability of your sample

  13. Are consensus trees inconsistent estimators of species trees? • Theorem 1. Majority-rule asymptotic consensus trees (MACTs) do not have any clades not on the species tree. • Theorem 2. Greedy asymptotic consensus trees (GACTs) can be misleading estimators of species trees for the 4-taxon asymmetric tree and for any species tree with n > 4 species. • Theorem 3. R* asymptotic consensus trees (RACTs) always match the species tree.

  14. Conclusions • Coalescent gene tree probabilities are useful for understanding asymptotic behavior of consensus trees constructed from independent gene trees. • Greedy consensus trees can be misleading, but are typically quicker to approach the species tree than majority-rule or R* when outside of the greedy zone. • R* consensus trees are consistent and more resolved than majority-rule consensus trees.

More Related