species trees constraint programming recent progress and new challenges n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Species Trees & Constraint Programming: recent progress and new challenges PowerPoint Presentation
Download Presentation
Species Trees & Constraint Programming: recent progress and new challenges

Loading in 2 Seconds...

play fullscreen
1 / 46

Species Trees & Constraint Programming: recent progress and new challenges - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

Species Trees & Constraint Programming: recent progress and new challenges. By Patrick Prosser Presented by Chris Unsworth. Outline. Tree of life (what’s that then?) Previous work (conventional and CP model) What’s new? (enhanced model, new problems) Conclusions (what have I told you!?)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Species Trees & Constraint Programming: recent progress and new challenges


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
species trees constraint programming recent progress and new challenges

Species Trees & Constraint Programming:recent progress and new challenges

By Patrick Prosser

Presented by Chris Unsworth

outline
Outline
  • Tree of life (what’s that then?)
  • Previous work (conventional and CP model)
  • What’s new? (enhanced model, new problems)
  • Conclusions (what have I told you!?)
  • Future work (will this never end?)
tree of life
Tree of life
  • A central goal of systematics
  • construct the tree of life
  • a tree that represents the relationship between all living things

including constraint programmers

  • The leaf nodes of the tree are species
  • The interior nodes are hypothesized species
  • extinct, where species diverged
slide7

Not to be

confused with

this either

slide8

Something

like

this

slide10

To date, biologists have cataloged about 1.7 million species yet estimates

of the total number of species ranges from 4 to 100 million.

“Of the 1.7 million species identified only about 80,000 species have

been placed in the tree of life”

E. Pennisi “Modernizing the Tree of Life” Science 300:1692-1697 2003

properties of a species tree
Properties of a Species Tree
  • We have a set of leaf nodes, each labelled with a species
  • the interior nodes have no labels (maybe)
  • each interior node has 2 children and one parent (maybe)
    • a bifurcating tree
  • Note: recently there has been a requirements that
  • interior nodes have divergence dates
  • leaf nodes correspond to other trees (such as a leaf “cats”)
  • trees might not bifurcate
super trees
Super Trees
  • We are given two trees, T1 and T2
  • S1 and S2 are the sets of leaves for T1 and T2 respectively
    • remember, leaves are species!
  • S1 and S2 have a non-empty intersection
    • some species appear in both trees
  • We want to combine T1 and T2
    • form a super tree
slide13

superTree

combine

most recent common ancestors mrca

c

a

b

a is closer to b than c

Most Recent Common Ancestors (mrca)

mrca(a,c) = mrca(b,c)

mrca(a,b)

We have 3 species, a, b, and c

Species a and b are more closely related

to each other than they are to c

mrca(a,b) mrca(a,c)

mrca(a,b)  mrca(b,c) mrca(a,c)  mrca(b,c)

The most recent common ancestor of a and b

is further from the root than the most recent

common ancestor of a and c (and b and c)

NOTE: mrca(x,y) = mrca(y,x)

most recent common ancestors mrca1

c

a

b

Most Recent Common Ancestors (mrca)

mrca(a,c) = mrca(b,c)

mrca(a,b)

mrca(a,b) mrca(a,c)

mrca(a,b)  mrca(b,c) mrca(a,c)  mrca(b,c)

Note: this

defines that

slide18

a

a

b

b

c

c

c

b

a

triples

fan

b

c

a

Ultrametric relationship

Given 3 leaf nodes labelled a, b, and c there are

only 4 possible situations

slide21

a

b

c

a

c

b

b

c

a

a

b

c

That’s all that there can be, for 3 leafs

slide22

Ultrametric relationship

Given 3 leaf nodes labelled a, b, and c there are

only 4 possible situations

We can represent this using primitive constraints

Where D[i,j] is a constrained integer variable representing

the depth in the tree of the most recent common ancestor

of the ith and jth species

slide23

Ultrametric constraint

Therefore the ultrametric constraint is as follows

slide24

How it goes (part 1)

Conventional technology (circa 1981)

  • Take 2 species trees T1 and T2
  • Use the “breakUp” algorithm (Ng & Wormald 1996) on T1 then T2
    • - This produces a set of triples and fans
  • Use the “oneTree” algorithm (Ng & Wormald 1996)
    • - Generates a superTree or fails

This is the “conventional” (non-CP) approach

Different versions of oneTree and breakUp from Semple and Steel

(I think) that treats fans differently (ignores them)

oneTree is essentially the algorithm of Aho, Sagiv,

Szymanski and Ullman in SIAM J.Compt 1981

slide25

How it goes (part 2)

CP approach (circa 2003)

  • Generate an n by n array of constrained integer variables
  • For all 0<i<j<k<n post the ultrametric constraint
    • - Yes, we have a cubic number of constraints
    • - Yes, we have a quadratic number of variables
    • - This gives us an “ultrametric matrix”
  • Use breakUp on trees T1 and T2 to produce triples and fans
  • Post the triples and fans as constraints, breaking disjunctions
  • Find a first solution
  • Convert the ultrametric matrix to an ultrametric tree

Algorithm for ultrametric matrix to ultrametric tree

given by Dan Gusfield

This is the CP approach proposed by Gent, Prosser, Smith & Wei

in CP03 (a great great paper, go read it )

slide26

An min ultrametric tree and its min ultrametric matrix

8

5

3

3

D

B

C

A

E

Matrix value is the value

of the most recent common

ancestor of two leaf nodes

As we go down a branch

values on interior nodes decrease

Don’t worry about it 

slide27

The state of play in 2003

  • Coded up in claire & choco
  • more a ”proof of concept” than a useful tool
  • small data sets only
slide29

Resultant superTree

On the left by oneTree and on the right by CP model

slide30

What’s new

2006

  • Reimplemented in java & JChoco (so faster)
  • More robust (thanks to Pierre Flener’s help)
  • Can now deal with larger trees (about 70 species)
  • Can generate all solutions up to symmetry
  • Can handle divergence dates on interior nodes
  • Reimplemented breakUp & oneTree in Java
  • All code available on the web
slide32

Bigger Trees

Attempted to reconstruct the supertree in Kennedy & Page’s

“Seabird supertrees: Combining partial estimates of

rocellariiform phylogeny” in “The Auk: A Quarterly Journal of

Ornithology” 119:88-108 2002

  • 7 trees of seabirds (A through G)
  • Varying in size from 14 to 90 species
slide33

From the paper

Table shows on the diagonal the size of each tree, A through G

A table entry is the size of the combined tree

A table entry in () if trees are incompatible

A table entry of – if trees are too big for CP model

The only compatible trees are A, B, D and F

The resultant supertree has 69 species

This takes 20 seconds to produce

slide35

A “lifted” representation

Rather than instantiate the “D” variables

why not just break the disjunctions?

Now the decision variables are P[i,j,k]

And yes, we have a cubic number of P variables

slide36

A “lifted” representation

Rather than instantiate the “D” variables

why not just break the disjunctions?

Now the decision variables are P[i,j,k]

  • Now we can:
  • Enumerate all solutions eliminating value symmetries
  • Allow ranges of values on interior nodes of trees
    • - input and output!
slide37

Ranked Trees

A new problem where input trees have ancestral divergence

dates on interior nodes

A new “conventional” technique is the RANKED TREE algorithm

slide38

Ranked Trees using “lifted” CP model

A new problem where input trees have ancestral divergence

dates on interior nodes

We do this in the “lifted” model by merely

1. reading in divergence dates for pairs of species and

posting these as constraints into the “D” variables

2. Then solve using the disjunction breaking “P” variables

3. Interior nodes retain range values

4. In addition can enumerate all solutions eliminating

value symmetries

slide39

Two trees of cats. Ranks (divergence information) on interior nodes

Common species in boxes

slide40

Two ranked cats trees on left, and on the right one of the ranked supertrees

NOTE: range of values [6..9] on mrca(PTE,LTI)

slide41

7 of the 17 solutions have ranges on interior nodes

Without the “lifted” representation we get 30 solutions (some redundant)

slide42

Is this a 1st?

  • We thinks so (or at least Patrick thinks so)
  • enumerate all solutions for ranked supertrees
  • remove value symmetries
slide43

What next?

Reduce the size of the model.

Improve propagation of ultrametric constraint

Identify common features (back bone) of all supertrees

Already underway with Neil Moore

slide44

Conclusion

  • presented a new (non-conventional) way of addressing the supertree problem
  • constraint model has been shown to be versatile
    • enumerate all solutions removing symmetries
    • address divergence dates on interior nodes
    • again enumerate all solutions for ranked trees
  • however, model is bulky/large
    • we are working on this
  • future extensions
    • find the backbone of forest of supertrees
    • address nested taxa
slide45

Thanks for helping

  • Pierre Flener
  • Xavier Lorca
  • Rod Page
  • Mike Steel
  • Charles Semple
  • Chris Unsworth
  • Neil Moore
  • Christine Wu Wei
  • Barbara Smith
  • Ian Gent