1 / 16

Tree Pattern Matching in Phylogenetic Trees

Tree Pattern Matching in Phylogenetic Trees. Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard, Laurent Duret, Simon Penel, Manolo Gouy, François Rechenmann, and Guy Perrière. Presented by: Jean Yeh. Background Information.

xuxa
Download Presentation

Tree Pattern Matching in Phylogenetic Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard, Laurent Duret, Simon Penel, Manolo Gouy, François Rechenmann, and Guy Perrière Presented by: Jean Yeh

  2. Background Information • The authors have created three databases that gather genes into homologous families • HOVERGEN – vertebrates • HOBACGEN – prokaryotes • HOGENOM – completely sequenced organisms • Among homologous genes, need to be able to differentiate orthologs from paralogs

  3. Homologous Sequences • Homologs: Two genes related by descent from a common ancestral DNA sequence • Orthologs: Two genes in different species; evolved from a single ancestral gene by speciation • Paralogs: Two genes related by duplication within a genome

  4. Orthologs and Paralogs http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif

  5. Gene Function • Gene function tends to change after gene duplication • Orthologs are more reliable predictors of gene function than paralogs • Evolutionary distance also plays a role • Closely related paralogs probably more similar than distantly related orthologs

  6. Goal • Create algorithms that allow for automatic searching for orthologs or paralogs in their databases • One algorithm for tree reconciliation • One algorithm for tree pattern matching • Implement under architecture used to query the databases

  7. Tree Reconciliation • Infers speciation and duplication events • Compares gene tree G with species tree S to give a reconciled tree R • Algorithm: • R = S • Step through G and R simultaneously • If nodes are incongruent, insert duplication node in R and annotate gene losses

  8. Tree Reconciliation

  9. Tree Pattern Matching • A tree pattern is a peculiar tree structure with taxonomic and evolutionary parameters contained in nodes and leaves • Can be considered a subtree • Want to match to a target tree • E.g. pattern (X, Y, Z) matches ((X, Y), Z), (X, (Y, Z)), and ((X, Z), Y)

  10. Tree Pattern Matching • Uses a recurrence algorithm that takes into account different taxonomic levels as well as the specific branch constraints • Cuts down on run time by checking the number of leaves in the pattern and the target tree • Allows users to search for orthologs/paralogs

  11. FamFetch Interface • User interface to access the databases • Incorporates both algorithms • Pattern editor has two frames: tool and pattern • Pattern frame – interactive editor to construct, load, save, and match patterns with a tree database • Tool frame – tools used in pattern frame

  12. FamFetch

  13. Tree Rooting • For tree reconciliation, the trees must be rooted • Authors use their reconciliation algorithm to find the most parsimonious solution – the one that requires the least number of gene duplications • Reconciliation algorithm relatively fast

  14. Tree Pattern Search • By forming their algorithm as a tree pattern search, the authors managed to increase possible queries for the users • Can search for gene duplication or gene speciation events, not just orthologs and paralogs • Also relatively fast algorithm, though lose the human flexibility of pattern matching

  15. Automatic Search for Orthologs • Previously done with pairwise BLAST searches and reciprocal hits • Need all genes and if genes are wrong, results may be wrong • Classifying genes into clusters of orthologs depends on evolutionary distance between species

  16. Possible Improvement • Have program estimate reliability of reconciliation • While it allows for easier comparative sequence analysis, it was designed solely for databases the authors had already created • Might be improved if it could be generalized for more databases

More Related