1 / 19

PhyloPat phylogenetic pattern analysis of eukaryotic genes

PhyloPat phylogenetic pattern analysis of eukaryotic genes. Tim Hulsen 2006-10-17 BeNeLux BioInformatics Conference 2006. Introduction (1). Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: 0011101011

shani
Download Presentation

PhyloPat phylogenetic pattern analysis of eukaryotic genes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhyloPatphylogenetic pattern analysisof eukaryotic genes Tim Hulsen 2006-10-17 BeNeLux BioInformatics Conference 2006

  2. Introduction (1) • Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: 0011101011 • Very useful for all kinds of evolutionary analyses: • Origin of certain genes • Deletion of certain genes • Clustering of genes with similar patterns: likely to have similar function / be in same pathway

  3. Introduction (2) • Earlier phylogenetic pattern initiatives: • Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) • Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) • Incorporated into OrthoMCL-DB (Chen et al., 2006) • All applied on proteins, not on genes! •  PhyloPat: phylogenetic pattern analysis of eukaryotic genes

  4. Method • Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant • Basis: Ensembl (EnsMart) database: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. • Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) • Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl

  5. Results • 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species • Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : • Can be queried in several ways • Output in HTML, Excel or plain text format

  6. Web interface http://www.cmbi.ru.nl/phylopat

  7. Pattern/ID Search • Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000********11111111’:  must be absent in non-chordata , must be present in all mammals • MySQL regular expression: e.g. ‘^0*1{10}0*$’  gives all genes that occur only in ten subsequent species • Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)

  8. Output

  9. Phylogenetic Tree

  10. Oligo-/Polypresent Genes • Oligopresent: present in only one/two species (oligo=few), e.g. ‘000000010000000000100’ • These two species should be highly related • C. sav C. int 1737 div. 100 Mya (Boffelli et al., 2004) • T. nig T. rub 1572 div. 85 Mya (Yakanoue et al., 2006) • A. gam A. Aeg 1058 div. 140 Mya (Service, 1993) • P. tro H. sap 887 div . 6 Mya (Glazko & Nei, 2003) • R. nor M. Mus 713 div. 20 Mya (Springer et al., 2003) • Polypresent: present in all species, except for one/two (poly=many), e.g. ‘111110111110111111111’ • These two species should be related too; similar analysis possible

  11. Omnipresent genes • Omnipresent: present in all 21 species (omni=all): ‘111111111111111111111’ • Currently 1001 omnipresent groups • Tend to have very general/important functions, mostly involved in transcription/translation

  12. FatiGO analysis • FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) • Analysis of all human genes in output by just one mouse click • e.g. omnipresent genes:

  13. Other possibilities • Anti-correlating patterns: e.g. ‘001111100011000000000’ and ‘110000011100111111111’  could be completely different, or very similar (analogous)! • Easy homology-inferred functional annotation (using information from other genes in the same lineage)

  14. Case study: Hox genes (1) • Hox genes determine where limbs and other body segments will grow in a developing embryo • Should exist mostly in vertebrates • Expansion in teleost fish species ( , 8-11); seven Hox clusters instead of the mammalian four • Search Ensembl database for human genes with term ‘hox’ in annotation • 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)

  15. Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP022041 011111136562233233222 011111111111111111111 MSX1, MSX2 PP024984 001000011111001111111 001000011111001111111 HOXC4 PP027791 001110023343233333333 001110011111111111111 TLX1, TLX2, TLX3 PP049478 000000221153112322223 000000111111111111111 HOXB8, HOXC8, HOXD8 PP053824 000000011120010101011 000000011110010101011 HOXD11 PP053827 000000022211111111111 000000011111111111111 HOXA10 PP053828 000000021111212122222 000000011111111111111 HOXC13, HOXD13 PP053829 000000063341122222222 000000011111111111111 HOXA1, HOXB1 PP053830 000000011110010111111 000000011110010111111 HOXB4 PP053832 000000021111011111111 000000011111011111111 HOXA5 PP053833 000000021110111111011 000000011110111111011 HOXB2 PP053834 000000031101011111111 000000011101011111111 HOXD3 PP053835 000000021110111111101 000000011110111111101 HOXA9 PP053836 000000021111111111111 000000011111111111111 HOXA3 PP053838 000000021110101111111 000000011110101111111 HOXC12 PP053839 000000011111111110111 000000011111111110111 HOXD4 PP053840 000000021111201011101 000000011111101011101 HOXC11 PP053842 000000043221111111111 000000011111111111111 HOXA13 PP053844 000000032231011111111 000000011111011111111 HOXB5 PP053845 000000021111111111011 000000011111111111011 HOXB3 PP053846 000000021121111111111 000000011111111111111 HOXD10 PP053847 000000022211111111111 000000011111111111111 HOXA2 PP053849 000000034151132333323 000000011111111111111 HOXA6, HOXB6, HOXC6 PP053853 000000011101111111011 000000011101111111011 HOXA4 PP053854 000000032252223133213 000000011111111111111 HOXB9, HOXC9, HOXD9 PP053858 000000011120011111111 000000011110011111111 HOXA11 PP070659 000000000121212222222 000000000111111111111 HOXA7, HOXB7 PP075622 000000000010001111111 000000000010001111111 HOXC5 PP084287 000000000001101111111 000000000001101111111 HOXC10 PP085049 000000000001011011111 000000000001011011111 HOXD1 PP087941 000000000000111011111 000000000000111011111 HOXD12 PP089685 000000000000111111111 000000000000111111111 HOXB13

  16. Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829,085049 HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847,053833 HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845,053834 HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844,075622 HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP053849 HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835,053854 HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287,053846 HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840,053824 HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838,087941 HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685,053828 HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984,053839 HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP027791 TLX TLX1 TLX2 TLX3 A. gamb. PP070659 HOX7 HOXA7 HOXB7 G. acul. central PP049478 HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP022041 MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Vertebrate Non- vertebrate Non- vertebrate

  17. Conclusions • PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database • Also usable for study of lineage-specific expansions of genes • Just updated to Ensembl v41 (released last Thursday); 5 new species: D.nov E.tel L.afr O.cun O.lat

  18. Acknowledgements supervisor Supervision: • Peter Groenen • Jacob de Vlieg Fruitful discussions: • Wilco Fleuren • Erik Franck • Nanning de Jong • Arnold Kuzniar head of group suggestions suggestions suggestions suggestions

  19. Where to find • Web interface: http://www.cmbi.ru.nl/phylopat (accessible through www.cmbi.ru.nl and www.nbic.nl) • Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: 398 http://www.biomedcentral.com/1471-2105/7/398 • Powered by Ensembl: http://www.ensembl.org/info/about/ensembl_powered.html • Poster P-20

More Related