Download
supertriplets a triplet based supertree approach to phylogenomics n.
Skip this Video
Loading SlideShow in 5 Seconds..
SuperTriplets: a triplet-based supertree approach to phylogenomics PowerPoint Presentation
Download Presentation
SuperTriplets: a triplet-based supertree approach to phylogenomics

SuperTriplets: a triplet-based supertree approach to phylogenomics

122 Views Download Presentation
Download Presentation

SuperTriplets: a triplet-based supertree approach to phylogenomics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

  2. Introduction: inferring phylogeny (1 gene) SuperTriplets: ISBM 2010

  3. Introduction: inferring phylogeny (3 genes) Gene 1 Gene 2 Gene 3 ?????????????????? ?????????????????? SuperMatrix ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????????????????????? ?????????????????????????????????? ?????????????????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????????????????????? ?????????????????????????????????? SuperTree SuperTriplets: ISBM 2010

  4. SNP / Morpho/ biblio Introduction: inferring phylogeny (more data) Gene 2 Gene 1000 ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ……………………….. ………………………. ……………………….. ?????????????????? ?????????????????? SuperMatrix ?????????????????? ?????????????????? ?????????????????? ?????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ?????????????????? ?????????????????? ?????????????????? SuperTree SuperTriplets: ISBM 2010

  5. [Goloboff and Pol, 2002] • Relation contradicted by all source trees C D E F B A A B C D E F C D E A B F MRP Supertree overview: MRP • MRP [Baum 1992, Ragan 1992] • 1 binary sequence per taxon • 1 site per clade (1=in the clade; 0 outside; ? missing) MR P 0100101001?11?0100 01??0?011?0???0010 ??0011010??001???? 0100010??00??001?0 111??0101000????01 SuperTriplets: ISBM 2010

  6. Supertree overview: intuitive approach • The Supertree problem (intuitive formulation) • Input: a collection of overlapping trees (a forest) • Output: the tree that best represents this collection • A major question is: how to define "bestrepresents" ? • Vizualizing supertree candidates within the tree space • Median supertree • Intuitive solution • Generalization of the consensus tree • Good theoretical properties [Steel and Rodriguo, 2008] SuperTriplets: ISBM 2010

  7. Supertree oveview: median tree Initial trees Tree restriction • Tree decomposition as: • split set • quartet set • triplet set d( , ) = + - SuperTriplets: ISBM 2010

  8. E D C B A T1 T2 T3 F G H B A C G F H B A C Supertree overview: MRP and median tree 0100101001?11?0100 01??0?011?0???0010 ??0011010??001???? 0100010??00??001?0 111??0101000????01 MR P Input forest AB|CAB|D … GH|F … FH|G … ABCDEFGH 110?????0 11?0????0 ……………………… ……………………… ?????1010 ?????0110 ……………………… Triplet MR Rooting SuperTriplets: ISBM 2010

  9. Supertree overview: MRP and median tree • The parsimony value is related to the triplet distance: • 1 parsimony step for triplets within the supertree • 2 parsimony steps for others • parsimony score = nbSites + (triplet distance)/2 • The MRP approach is unadapted to triplet encoding • for 100 taxa 97% of « ? » • for 1000 taxa 99.7% of « ? » • unnecessary huge matrices SuperTriplets: ISBM 2010

  10. asymmetric Supertriplets: few notations • Given a forest F of input trees • N+(xy|z): number of occurrences of xy|zin F • N-(xy|z) = N+(xz|y) + N+(yz|x) (alternive resolutions in F) • Input trees are then useless (little impact of forest size) • Searching for the (asymmetric) triplet median tree T: • median : SuperTriplets: ISBM 2010

  11. Supertriplets: general overview O(n3 |F| ) O(n3) + consistency triplet decompostion O(n3) to test all branches once first sketch NJ-like strategy improvementNNI local search N-(homo pan|mus) N+(homo pan|mus) N-(pan bos|mus) N+(pan bos|mus) N-(homo pan|bos) N+(homo pan|bos) N-(mus pan| bos) N+(mus pan|bos) … … O(n3) branch support and collapse SuperTriplets: ISBM 2010

  12. E D C B A T0 T1 T2 T3 E D C B A E D C B A E D C B A C1={A} C2={B} C1={A,B} C2={C} C1={D} C2={E} AC|D BC|D AC|E BC|E AB|C AB|D AB|E DE|A DE|B DE|C Triplets(T3 ) Supertriplets: agglomerative process SuperTriplets: ISBM 2010

  13. Supertriplets: agglomerative process • Agglomeration of (CA,CB ) • Transform T into T’ • Resolve some new triplets (AB|X) with ACA, BCB, X{CACB} • d3( T’,F ) = d3( T,F ) - ( ∑ N+(AB|X) - ∑ N-(AB|X) ) • We select the pair maximizing • Score (CA, CB) = (∑ N+(AB|X) - ∑ N- (AB|X)) / (∑ N+(AB|X) + ∑ N-(AB|X) ) • The whole process is O(n3) : when CA and CB are agglomerated • score(CD , CE )is unchanged • score(C{AB} ,CD ) is easily derived from Score (CA, CD ) andScore (CB, CD ) SuperTriplets: ISBM 2010

  14. Supertriplets: NNI optimisation • The variation d3(T’,F) - d3(T,F) • depends on few triplets (here ) • All these variations are initially evaluated in O(n3) • Once a NNI is done • few NNI have to be re-evaluated (4 adjacent edges) • NNI optimisation is therefore very fast T’ T 2 possible NNI per edge SuperTriplets: ISBM 2010

  15. Supertriplets: edge supports • Local support • ∑ N+() / [ ∑ N+( ) + ∑ N-() ] • If <0.5 collapsing the edgeimproved3(T,F) • Global support • Alsotakeintoaccount • N+() and N- ( )impact twoedges • Final edge support: min (local, global) T SuperTriplets: ISBM 2010

  16. Supertriplets: simulation protocol [Eulenstein et al. 2004] [Criscuolo et al. 2006] Are they similar? Triplet/split measure SuperTriplets: ISBM 2010

  17. Supertriplets: simulation results triplets Splits Contain errors Less resolved Very few errors perfect lack of resolution SuperTriplets: ISBM 2010

  18. Supertriplets: phylogenomic case study • Supertree of 33 mammals • Species: complete genomes ( EnsEMBL v54) • Sequences: orthologous CDS (orthoMaM v5) • Gene trees: 13 000 ML trees (inferred using PAUP) • Output supertree • Computed in 30s • Congruent with [Prasad et al. 2008] SuperTriplets: ISBM 2010

  19. Conclusion & prospects • (Asymmetric) median supertree • Easy to understand • Makes tree weighting natural • MRP, triplets and median supertree • Understanding the criteria optimized by MRP • Design a dedicated algorithm to optimize it • http://www.supertriplets.univ-montp2.fr/ • Supertrees & supermatrix are complementary • 1 000 vertebrate genome project • Divide and conquer approachi) trees based on multiple CDSs (supermatrix)ii) assembling those trees (supertree) SuperTriplets: ISBM 2010

  20. Supertriplets: http://www.supertriplets.univ-montp2.fr/ O(n3 |F| ) O(n3) + consistency triplet decompostion O(n3) to test all branches once first sketch NJ-like strategy improvementNNI local search N-(homo pan|mus) N+(homo pan|mus) N-(pan bos|mus) N+(pan bos|mus) N-(homo pan|bos) N+(homo pan|bos) N-(mus pan| bos) N+(mus pan|bos) … … O(n3) branch support and collapse Less resolved Very few errors SuperTriplets: ISBM 2010

  21. Supertree overview: asymmetric median tree F1 E D C B A E D C B A E D C B A E D C B A d(F1, ) = d( + ) d(F1, ) = 3 * d( + ) F2 E D C B A E D C B A E D C B A E D C B A d(F2, ) = d( + ) d(F2, ) = 3*d( + ) REF SuperTriplets: ISBM 2010