supertriplets a triplet based supertree approach to phylogenomics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SuperTriplets: a triplet-based supertree approach to phylogenomics PowerPoint Presentation
Download Presentation
SuperTriplets: a triplet-based supertree approach to phylogenomics

Loading in 2 Seconds...

play fullscreen
1 / 20

SuperTriplets: a triplet-based supertree approach to phylogenomics - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

SuperTriplets: a triplet-based supertree approach to phylogenomics. Vincent Ranwez , Alexis Criscuolo and Emmanuel J.P. Douzery. Introduction: inferring phylogeny (1 gene). Introduction: inferring phylogeny (3 genes). Gene 1. Gene 2. Gene 3. ?????????????????? ??????????????????.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SuperTriplets: a triplet-based supertree approach to phylogenomics' - mari


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
supertriplets a triplet based supertree approach to phylogenomics

SuperTriplets: a triplet-based supertree approach to phylogenomics

Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

introduction inferring phylogeny 3 genes
Introduction: inferring phylogeny (3 genes)

Gene 1

Gene 2

Gene 3

??????????????????

??????????????????

SuperMatrix

??????????????????

??????????????????

??????????????????

??????????????????

??????????????????????????????????

??????????????????????????????????

??????????????????????????????????

???????????????????????

???????????????????????

???????????????????????

??????????????????

??????????????????

??????????????????

??????????????????????????????????

??????????????????????????????????

SuperTree

SuperTriplets: ISBM 2010

introduction inferring phylogeny more data

SNP / Morpho/ biblio

Introduction: inferring phylogeny (more data)

Gene 1

Gene 1000

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

………………………..

……………………….

………………………..

??????????????????

??????????????????

SuperMatrix

??????????????????

??????????????????

??????????????????

??????????????????

???????????????????????

???????????????????????

???????????????????????

??????????????????

??????????????????

??????????????????

SuperTree

SuperTriplets: ISBM 2010

supertree overview mrp

[Goloboff and Pol, 2002]

    • Relation contradicted by all source trees

C

D

E

F

B

A

A

B

C

D

E

F

C

D

E

A

B

F

MRP

Supertree overview: MRP
  • MRP [Baum 1992, Ragan 1992]
    • 1 binary sequence per taxon
    • 1 site per clade (1=in the clade; 0 outside; ? missing)

MR

P

0100101001?11?0100

01??0?011?0???0010

??0011010??001????

0100010??00??001?0

111??0101000????01

SuperTriplets: ISBM 2010

supertree overview intuitive approach
Supertree overview: intuitive approach
  • The Supertree problem (intuitive formulation)
    • Input: a collection of overlapping trees (a forest)
    • Output: the tree that best represents this collection
    • A major question is: how to define "bestrepresents" ?
  • Vizualizing supertree candidates within the tree space
  • Median supertree
    • Intuitive solution
    • Generalization of the consensus tree
    • Good theoretical properties [Steel and Rodriguo, 2008]

SuperTriplets: ISBM 2010

supertree oveview median tree
Supertree oveview: median tree

Initial trees

Tree restriction

  • Tree decomposition as:
    • split set
    • quartet set
    • triplet set

d( , ) = + -

SuperTriplets: ISBM 2010

supertree overview mrp and median tree

E

D

C

B

A

T1

T2

T3

F

G

H

B

A

C

G

F

H

B

A

C

Supertree overview: MRP and median tree

0100101001?11?0100

01??0?011?0???0010

??0011010??001????

0100010??00??001?0

111??0101000????01

MR

P

Input forest

AB|CAB|D … GH|F … FH|G …

ABCDEFGH

110?????0

11?0????0

………………………

………………………

?????1010

?????0110

………………………

Triplet MR

Rooting

SuperTriplets: ISBM 2010

supertree overview mrp and median tree1
Supertree overview: MRP and median tree
  • The parsimony value is related to the triplet distance:
    • 1 parsimony step for triplets within the supertree
    • 2 parsimony steps for others
    • parsimony score = nbSites + (triplet distance)/2
  • The MRP approach is unadapted to triplet encoding
    • for 100 taxa 97% of « ? »
    • for 1000 taxa 99.7% of « ? »
    • unnecessary huge matrices

SuperTriplets: ISBM 2010

supertriplets few notations

asymmetric

Supertriplets: few notations
  • Given a forest F of input trees
    • N+(xy|z): number of occurrences of xy|z in F
    • N-(xy|z) = N+(xz|y) + N+(yz|x) (alternive resolutions in F)
    • Input trees are then useless (little impact of forest size)
  • Searching for the (asymmetric) triplet median tree T:
    • median :

SuperTriplets: ISBM 2010

supertriplets general overview
Supertriplets: general overview

O(n3 |F| )

O(n3)

+ consistency

Triplet decompostion

O(n3) to test all

branches once

first sketch

NJ-like strategy

improvementNNI local search

N-(homo pan|mus)

N+(homo pan|mus)

N-(pan bos|mus)

N+(pan bos|mus)

N-(homo pan|bos)

N+(homo pan|bos)

N-(mus pan| bos)

N+(mus pan|bos)

…………

………..

O(n3)

Branch support

and collapse

SuperTriplets: ISBM 2010

supertriplets agglomerative process

E

D

C

B

A

T0

T1

T2

T3

E

D

C

B

A

E

D

C

B

A

E

D

C

B

A

C1={A} C2={B}

C1={A,B} C2={C}

C1={D} C2={E}

AC|D BC|D

AC|E BC|E

AB|C

AB|D

AB|E

DE|A

DE|B

DE|C

Triplets(T3)

Supertriplets: agglomerative process

SuperTriplets: ISBM 2010

supertriplets agglomerative process1
Supertriplets: agglomerative process
  • Agglomeration of (CA,CB )
    • Transform T into T’
    • Resolve some new triplets (AB|X) with ACA, BCB, X{CACB}
    • d3( T’,F ) = d3( T,F ) - ( ∑ N+(AB|X) - ∑ N-(AB|X) )
  • We select the pair maximizing
    • Score (CA, CB) = (∑ N+(AB|X) - ∑ N- (AB|X)) / (∑ N+(AB|X) + ∑ N-(AB|X) )
  • The whole process is O(n3) : when CA and CB are agglomerated
    • score(CD ,CE )is unchanged
    • score(C{AB} ,CD ) is easily derived from Score (CA, CD ) and Score (CB, CD )

SuperTriplets: ISBM 2010

supertriplets nni optimisation
Supertriplets: NNI optimisation
  • The variation d3(T’,F) - d3(T,F)
    • depends on few triplets (here )
    • All these variations are initially evaluated in O(n3)
  • Once a NNI is done
    • few NNI have to be re-evaluated (4 adjacent edges)
    • NNI optimisation is therefore very fast

T’

T

2 possible

NNI per edge

SuperTriplets: ISBM 2010

supertriplets edge supports
Supertriplets: edge supports
  • Local support
    • ∑ N+( ) / [ ∑N+( ) + ∑N-( ) ]
    • If <0.5 collapsing the edge improve d3(T,F)
  • Global support
    • Also take into account
    • N+( ) and N- ( ) impact two edges
  • Final edge support: min (local, global)

T

SuperTriplets: ISBM 2010

supertriplets simulation protocol
Supertriplets: simulation protocol

[Eulenstein et al. 2004] [Criscuolo et al. 2006]

Are they similar?

Triplet/split measure

SuperTriplets: ISBM 2010

supertriplets simulation results
Supertriplets: simulation results

triplets

Splits

Contain errors

Less resolved

Very few errors

perfect

lack of resolution

SuperTriplets: ISBM 2010

supertriplets phylogenomic case study
Supertriplets: Phylogenomic case study
  • Supertree of 33 mammals
    • Species: complete genomes ( EnsEMBL v54)
    • Sequences: orthologous CDS (orthoMaM v5)
    • Gene trees: 13 000 ML trees (inferred using PAUP)
  • Output supertree
    • Computed in 30s
    • Congruent with [Prasad et al. 2008]

SuperTriplets: ISBM 2010

conclusion prospects
Conclusion & prospects
  • (Asymmetric) median supertree
    • Easy to understand
    • Makes tree weighting natural
  • MRP, triplets and median supertree
    • Understanding the criteria optimized by MRP
    • Design a dedicated algorithm to optimize it
    • http://www.supertriplets.univ-montp2.fr/
  • Supertrees & supermatrix are complementary
    • 1 000 vertebrate genome project
    • Divide and conquer approachi) trees based on multiple CDSs (supermatrix)ii) assembling those trees (supertree)

SuperTriplets: ISBM 2010

supertriplet http www supertriplets univ montp2 fr
Supertriplet: http://www.supertriplets.univ-montp2.fr/

O(n3 |F| )

O(n3)

+ consistency

Triplet decompostion

O(n3) to test all

branches once

first sketch

NJ-like strategy

improvementNNI local search

N-(homo pan|mus)

N+(homo pan|mus)

N-(pan bos|mus)

N+(pan bos|mus)

N-(homo pan|bos)

N+(homo pan|bos)

N-(mus pan| bos)

N+(mus pan|bos)

…………

………..

O(n3)

Branch support

and collapse

Less resolved

Very few errors

SuperTriplets: ISBM 2010