The bootstrap consenus trees and super trees
Download
1 / 43

The bootstrap, consenus-trees, and super-trees - PowerPoint PPT Presentation


  • 282 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Phylogenetics Workhop, 16-18 August 2006. The bootstrap, consenus-trees, and super-trees. Barbara Holland. What is the bootstrap?. Like in many other areas where statistical inference is applied, in phylogenetics it is not just of interest to get a point estimate of the phylogenetic tree.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentationdownload

The bootstrap, consenus-trees, and super-trees

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The bootstrap consenus trees and super trees l.jpg

Phylogenetics Workhop,

16-18 August 2006

The bootstrap,consenus-trees, and super-trees

Barbara Holland


What is the bootstrap l.jpg

What is the bootstrap?

  • Like in many other areas where statistical inference is applied, in phylogenetics it is not just of interest to get a point estimate of the phylogenetic tree.

  • We would also like some measure of confidence in our point estimate.

    • Is our tree likely to change if we got more data, or if we had used slightly different data?

    • How robust is our result to sampling error?

  • The bootstrap is a useful tool for answering these sorts of questions.


Assessing confidence in trees l.jpg

Assessing confidence in trees

  • In 1985 Felsenstein introduced the idea of the bootstrap to phylogenetics.

  • For each boostrap sample

    • Create a new alignment by resampling the columns of the observed alignment

    • Construct a tree for the ‘bootstrap’ alignment

  • Can be applied to any method that starts from a sequence alignment, e.g., parsimony, likelihood, clustering methods if the distances are derived from an alignment…

  • The bootstrap support for each edge is the number of bootstrap trees that edge appears in.


Slide4 l.jpg

1234567

a ATATAAA

bATTATAA

cTAAAATA

dTATAAAT

1224567

a ATTTAAA

bATTATAA

cTAAAATA

dTAAAAAT

1334567

a AAATAAA

bATTATAA

cTAAAATA

dTTTAAAT

1234567

a ATATAAA

bATTATAA

cTAAAATA

dTATAAAT

1244567

a ATTTAAA

bATAATAA

cTAAAATA

dTAAAAAT

c

a

c

a

a

c

a

b

d

b

c

d

d

b

b

d

c

a

0.75

b

d


Example where the bootstrap is useful l.jpg

0.01

0.2

a

b

c

d

Example where the bootstrap is useful

  • Simulate data on the four taxon tree below (JC model)

  • Use sequence lengths of 100, 1000, and 10000


Example where the bootstrap is not so useful l.jpg

0.05

0.05

0.1

0.1

d

d

a

a

b

c

c

b

Example where the bootstrap is not so useful

  • Simulate data on the two four-taxon trees below (JC model) in the proportion 55%, 45% and concatenate the sequences

  • Use total sequence lengths of 100, 1000, and 10000

55%

45%


Consensus trees l.jpg

Consensus trees

  • Consensus trees attempt to summarise the information contained in a set of trees, where each tree in the set is on the same taxa.

  • Some consensus tree methods are specific to rooted trees.


Why are consensus methods required l.jpg

Why are consensus methods required?

  • Many phylogenetic methods produce a collection of trees rather than a single best tree.

    • Monte Carlo Markov Chain (MCMC)

    • Bootstrapping.

    • Equally parsimonious trees

  • Sometimes trees for different genes produce a collection of trees.


Terminology splits and clades l.jpg

Terminology: Splits and clades

  • Each edge in an unrooted tree corresponds to a split or bipartition of the taxa set.

  • Each edge in a rooted tree corresponds to a clade.


Splits l.jpg

Splits

mouse

dog

turtle

cat, dog, mouse, parrot | turtle

parrot

cat

dog, cat | mouse, turtle, parrot

cat, dog, mouse | turtle, parrot


Clades l.jpg

Clades

dog

cat

mouse

parrot

turtle


Clades12 l.jpg

Clades

dog

cat

mouse

parrot

turtle


Clades13 l.jpg

Clades

dog

cat

mouse

parrot

turtle


Strict consensus l.jpg

Strict Consensus

  • The strict consensus tree contains only those splits/clades that appear in all trees

mouse

mouse

turtle

dog

dog

mouse

dog

turtle

turtle

parrot

parrot

cat

cat

cat

parrot

mouse

dog

turtle

parrot

cat


Semi strict l.jpg

Semi-strict

  • The semi-strict consensus tree also contains those splits/clades that don’t conflict with any of the input trees

mouse

mouse

dog

dog

turtle

turtle

parrot

cat

cat

parrot

mouse

dog

turtle

cat

parrot


Majority rule l.jpg

Majority-Rule

  • The majority-rule consensus tree contains only those splits/clades that appear in more than 50% of the input trees

dog

mouse

mouse

turtle

turtle

dog

mouse

dog

turtle

parrot

cat

parrot

cat

cat

parrot

turtle

dog

mouse

parrot

cat


Terminolgy 3 taxon statements l.jpg

dog

cat

mouse

parrot

turtle

Terminolgy: 3-taxon statements

  • 3-taxon statements are triples of three species that show two species to be more closely related than is the third.

  • E.g. the tree below displays the 3-taxonstatements

    ((dog,cat),mouse)

    ((dog,mouse),parrot)

    ((mouse,parrot),turtle)

    …and others…


Terminology rooted trees hierarchies clusters and partitions l.jpg

Terminology: Rooted trees, hierarchies, clusters, and partitions

Hierarchy of clusters

Partitions

abcd | ef

{a,b,c,d}

a | bcd | ef

{b,c,d}

a | b | cd | ef

{e,f}

{c,d}

a

b

c

d

e

f

{a}

{b}

{c}

{d}

{e}

{f}


Products of partitions l.jpg

Products of partitions

  • Given k partitions p1, p2, p3,…, pk of the same set of taxa, the product of these partitions is the partition where a and b are in the same block if and only if the are in the same block for each pi

  • Example: The product of abc|de and ad|bce is a|bc|d|e


Adams consensus l.jpg

Adams Consensus

  • Adams consensus method only applies to rooted trees.

  • It preserves all the 3-taxon statements that are common to all of the input trees.

  • Recursive algorithm that looks at the product of the maximal partitions of each of the input trees


Adamstree algorithm from bryant 2003 l.jpg

AdamsTree algorithm (from Bryant 2003)

Procedure AdamsTree(T1,…Tk)

ifT1 contains only 1 leaf then

returnT1

else

construct the product of the maximal partitions of the input trees

For each block B in the partition do

construct AdamsTree(T1|B, …Tk|B)

Attach the roots of these trees to a new node v

return this tree

end


Adams consensus example l.jpg

e

b

c

d

a

f

a

b

c

d

e

f

Adams consensus example

Maximal partition

abcd | ef

Maximal partition

bcde | af

Product of maximal partitions

a|bcd|e|f

{f}

{a}

{b,c,d}

{e}


Adams consensus example cont l.jpg

Adams consensus example cont.

Restrict to b,c,d

e

b

c

d

a

f

a

b

c

d

e

f

Maximal partition

b | cd

Maximal partition

b | cd

Product of maximal partitions

b | cd

{f}

{a}

{b,c,d}

{e}

{b}

{c,d}


Adams consensus example cont24 l.jpg

Adams consensus example cont.

Restrict to c,d

e

b

c

d

a

f

a

b

c

d

e

f

Maximal partition

c | d

Maximal partition

c | d

{f}

{a}

{b,c,d}

{e}

Product of maximal partitions

c | d

{b}

{c,d}

{c}

{d}


What about an adams like method for unrooted trees l.jpg

What about an “Adams” like method for unrooted trees?

  • Instead of triples we would need to consider statements about quartets of taxa.

  • If a quartet ((a,b),(c,d)) appeared in all the input trees it should be displayed in the output.

  • Easy enough?


Three requirements steel dress and b cker 2000 l.jpg

Three requirements (Steel, Dress and Böcker 2000)

  • Relabelling of the species at the tip of the tree should yeild the same answer relabelled in the appropriate way

  • The input order of the trees should not matter

  • A quartet that appears in all the input trees should appear in the output tree


No method can satisfy these 3 requirements l.jpg

No method can satisfy these 3 requirements

  • Counter example

f

a

b

a

b

c

e

f

c

d

d

e


Supertree methods l.jpg

Supertree methods

  • Super-tree methods take a set of trees on overlapping taxa sets and return a tree (or sometimes a ‘fail’ message)

  • Biological relevance

    • Not all genes are present in all species

    • Not all genes are easy to sequence for all species

  • Assembling the Tree of Life

    • Computationally impossible to try and build a tree for all taxa

    • Use a divide and conquer approach

    • And then use supertree methods to piece the Tree of Life together


Concept refinement l.jpg

Concept: Refinement

c

c

b

b

d

refines

d

a

a

e

e

The trees below are also refinements

d

e

b

b

c

d

a

a

e

c


Concept restriction l.jpg

Concept: Restriction

T

c

b

d

e

a

f

h

g

The label set X = {a,b,c,d,e,f,g,h}

We can restrict T to any subset of the labels X’


Concept restriction31 l.jpg

Concept: Restriction

E.g. The restriction to {a,c,e,g}

T

c

c

e

b

d

e

a

a

g

f

h

g

Find the subtree and then supress the degree two vertices


Concept displaying l.jpg

Concept: Displaying

A tree T (on label set X) displays a tree T’ (on label set X’ subset of X) if Trestricted to the labels X’ is a refinement of T’

E.g.

d

c

c

e

d

b

a

f

displays

and

d

b

a

e

f

a

f


Concept displaying33 l.jpg

Concept: Displaying

BUT

d

c

c

e

d

b

a

f

Does not display

or

b

d

a

e

f

a

c


The build algorithm l.jpg

The BUILD algorithm

  • Polynomial-time algorithm due to Aho et al (1981)

  • Takes a set of rooted input trees and either outputs a supertree that displays all of the input trees or returns a fail message.


Build algorithm l.jpg

BUILD algorithm

  • Recursive algorithm, at each step it constructs a graph associated with the triples displayed by the input trees.

  • Depending on whether this associated graph is connected or disconnected the algorithm either terminates or subdivides the problem.

  • What is this associated graph?


The associated graph l.jpg

The associated graph

  • Nodes of the graph are the complete label set, i.e. all the labels that appear in any of the input trees

  • Put an edge between two nodes a and b if there is at least one input tree that displays the rooted triple ((a,b),c) for some c.

  • If this graph is connected stop and report a fail message

  • Otherwise call the algorithm again once for each connected component, restricting the input to the labels in that component.


Build example from semple and steel l.jpg

BUILD Example (from Semple and Steel)

d

a

b

c

e

c

b

e

a

b

f

d

b

c

d

a

{a,b,c,f}

{d,e}

e

f


Build example continued l.jpg

BUILD example continued

Subproblem 1: Restrict input to {a,b,c,f}

a

b

c

c

b

a

b

f

b

c

a

{a,b,c,f}

{d,e}

f

{f}

{a,b}

{c}


Build example continued39 l.jpg

BUILD example continued

Subproblem 2 and 3 on {d,e}, and {a,b} are trivial so the final tree is

{a,b,c,f}

{d,e}

{f}

{e}

{a,b}

{d}

{c}

{a}

{b}

a

b

c

f

d

e


What if the trees don t agree l.jpg

What if the trees don’t agree?

  • If the input trees are not compatible BUILD will return a fail message.

  • It is also of interest to have methods that will return some output even if the input trees cannot all be displayed by a single supertree.

  • Matrix representation with parsimony (MRP) is one such method…


M atrix r epresentation with p arsimony mrp l.jpg

Matrix Representation with Parsimony (MRP)

  • Supertree method invented independently by Baum and Ragan (1992).

  • Recode the input trees as a character matrix where each edge in each input tree defines a character.

  • Do a parsimony analysis of the resulting character matrix.

  • Take the strict consensus of the most parsimonious trees.


Mrp example l.jpg

c

d

e

b

f

a

d

c

e

b

g

a

g

d

e

h

MRP example

4

6

2

4

2

8

4

6

2

8

3

5

7

3

1

9

3

5

7

5

1

9

1

12345678912345678912345

a101010100101010000?????

b011010100011010000?????

c000110100000001000?????

d00000110000011000001100

e00000001000000011010100

f000000001??????????????

g?????????00000010100010

h??????????????????00001


Mrp example43 l.jpg

c

d

e

b

f

a

d

c

e

b

g

a

g

d

e

h

MRP example

10 most parsimonious trees

Strict consensus:

e

c

b

f

g

a

d

h


ad
  • Login