Efficient Phylogenetic Tree Clustering Using Clique Partitioning

Clustering of Phylogenetic Trees by Clique Partitioning Walker Pett DivyaMistry

Motivation • Problems with classical methods • Produce a single consensus tree based on some scoring metric • Poorly resolved or hard to interpret • Problem with current methods (e.g. K-means) • Produces multiple trees, but relies on classical single-tree approach

Our Work • Graph-theory approach (Clique Partitioning) • Avoid creating single-tree consensus • Reduce set of trees to their non-trivial bipartitions • Use bipartitions to create compatibility graph • Find all the cliques to identify possible consensus trees • Provide the minimal set of these consensus trees that represent the whole input • Clique Partitioning is NP-Hard

Notations & Definitions • Phylogenetictree • Leaf labeled tree. All internal nodes of degree  3 • Bipartitions of a tree T • Tree T on taxaS(s1,s2,…,sn) • Removing an edge gives two subtrees • Set of leaves of these subtrees create a bipartition A|B • A tree is uniquely defined by its set of bipartitions S(T) a b c d A={a , b} B={c , d}

Compatible Bipartitions • Bipartitions are in S(T) for some tree T • A|B and C|D are compatible iff one of AC, AD, BC, BD is empty (Hamel and Steel, 1996) • Trivial Bipartition • For a bipartition A|B, |A|1 or |B|1 • All trivial bipartitions are compatible with any other bipartition. • Non-trivial Bipartition • All bipartitions that aren’t trivial, are considered non-trivial

Set of non-trivial bipartitions of trees • Given set of trees R = T1,T2,…,Tm • unique non-trivial bipartitions of R • Compatibility graph • Graph with BP(R) as vertex set • Edge exists for compatible vertices • Incompatibility graph • Complement of compatibility graph

Clique • Clique Partitioning Problem (CPP) • Partitioning compatibility graph into minimal number of cliques • Minimal Graph Coloring Problem (Minimal GCP) • Using the least number of colors, assign a color to each vertex s.t. no two adjacent vertices have same color • We transform CPP to Minimal coloring through incompatibility graph

Incompatibility graph with minimum coloring Compatibility graph

Goal • Use Minimal coloring to solve CPP using incompatibility graph. • Use this transformation to reduce set R to minimal set of consensus trees C such that BP(R) = BP(C)

Algorithm • Establish an order to the vertices to solve graph coloring • Greedy heuristics • Largest Degree Ordering (LDO) • Order by number of vertices adjacent to a vertex • Saturation Degree Ordering (SDO) • Order by number of differently colored adjacent vertices • Combine LDO/SDO for best results (Al-Omari and Sabri, 2006)

Find all non-trivial bipartitions of tree set R • Can be done in O(n) for one tree. • Traversal of 2n-1 internal nodes. Each time, save the bipartition set. • For R = T1,…,Tm finding BP(R) would take O(mn) • Construct incompatibility graph • Compare each bipartition with every other • O(|BP(R)|2), so O(m2n2) • Coloring of incompatibility graph • O(|BP(R)|3), so O(m3n3) (Bhaskar and Samad, 2006)

Results • Compare quality of our results by comparing percentage of bipartitions accounted for by the best consensus tree from each method • MR = majority rule single-tree consensus method. MR produces a tree from all bipartitions that are present in at least 50% of all trees. • Camp, Caesal, PEVCCA1, PEVCCA2 have been used by Weeks et.al. (2001), Moret et.al. (2001), Cosner et.al. (2000), and Van de Peer et.al. (1999) to evaluate tree clustering methods.

Discussion • For all four datasets, our method produced consensus tree that is always more resolved than the popular Majority Rule consensus • Tree produced from our method is identical to the one produced by Extended Majority Rule • Majority Rule consensus with adding bipartitions compatible with the MR consensus tree • Additionally, our method produced tree that had information present in original input but absent in single-tree consensus method

Trees produced using our method is at least as informative as single-tree consensus, and usually more informative considering previous point • Our method is similar to Multipolar Consensus method proposed by Bonnard et.al. (2006); however, we improve by choosing more sophisticated coloring heuristic.

Conclusion • Efficient method for computing minimal consensus of a set of trees • Compatibility graph using unique non-trivial bipartitions • Minimal partition of this graph into cliques • CPP solved by transforming it to GCP and then employing polynomial time greedy heuristic • Our method produces a set with fewest trees that retain all of the info in original input set • Better than single-tree consensus • Interest to biologist: • visualize the result of phylogenetic analysis in simplest and most informative way.

Acknowledgement • Dr. Oliver Eulenstein for support and guidance in introducing the problem and directing to some earlier work in CPP.

Done!

Efficient Phylogenetic Tree Clustering Using Clique Partitioning

Efficient Phylogenetic Tree Clustering Using Clique Partitioning

Presentation Transcript

Phylogenetic Trees

Clique Trees

Phylogenetic Trees

PHYLOGENETIC TREES

Phylogenetic Trees

Clustering, Phylogenetic Trees, and Inferences about Evolution

Phylogenetic Trees

Phylogenetic Trees

Phylogenetic Trees

Phylogenetic Trees

Phylogenetic trees

Phylogenetic trees

Terminology of Phylogenetic Trees

Phylogenetic Trees

Phylogenetic trees

Phylogenetic Trees

Phylogenetic trees

Phylogenetic Trees

Phylogenetic trees

Phylogenetic Trees: Assumptions

Phylogenetic Trees

Phylogenetic Trees