1 / 21

* University of Alabama in Huntsville † Centre INRA de Nancy

TwinScan Annotation of the Laccaria Sequences & Annotation of Genes in the Signaling Pathways Michael Muratet*, S é bastien Duplessis † , Gopi Podila* 2 nd Laccaria Genome Meeting Gent, Belgium October 14, 2005. * University of Alabama in Huntsville † Centre INRA de Nancy. Overview.

sawyer
Download Presentation

* University of Alabama in Huntsville † Centre INRA de Nancy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TwinScan Annotation of the Laccaria Sequences&Annotation of Genes in the Signaling PathwaysMichael Muratet*, Sébastien Duplessis†, Gopi Podila*2ndLaccaria Genome MeetingGent, BelgiumOctober 14, 2005 * University of Alabama in Huntsville † Centre INRA de Nancy

  2. Overview • TwinScan Annotation • TwinScan Theory & Application • Annotation Process • Summary of Results • Annotation of Genes in the Signaling Pathways • Target List • Annotation Processes • Summary of Results

  3. TwinScan Theory & Application • Combines the (HMM) probability model of Genscan with a probability model for a ‘conservation sequence’ • Training files for C. elegans, A. thaliana, C. neoformans, H. sapiens • L. bicolor similar to C. neoformans • Conservation sequence is created using BLAST alignments of related species (‘informants’) • Accuracy is relatively insensitive to BLAST parameters (require 30 bp @ 66% identity) • Does not attempt global alignments (as compared to Rosetta or CEM) • Approximately 60% more sensitive than Genscan, but still only 25% correct • Requirements • ~ 1 GByte memory per 1 Mbase of sequence • perl Korf, I., Flicek, P., Duan, D., Brent, M.R. (2001). “Integrating genomic homology into gene structure prediction”, Bioinformatics, 17 (Suppl 1):5140-5148.

  4. TwinScan Annotation Process M=1 N=-1 –nogap Q=5 R=1 S=35 S2=35 W=10 X=30 B=1000 Y=Z=300000000 “Informant” Database C. NEOFORMANS GENOME LACCARIA SCAFFOLDS xdformat (wu)blastn www.sequence.stanford.edu/ Group/c.neoformans/download.html BLAST RESULTS GENBANK/ FASTA/EMBL ‘CONSERVATION’ SEQUENCES conseq.pl iscan ANNOTATION RESULTS process_zoe Note: No ESTs used!

  5. TwinScan Output # iscan # Date: Sun Jul 24 23:58:02 2005 # Twinscan version 2.02 build 20041011CW # Genome Parameters: TwinScan/parameters/crypto_iscan-1208-genes-09-15-2003.zhmm # Conservation Parameters: TwinScan/parameters/crypto_iscan-1208-genes-09-15-2003.zhmm # Target Sequence: >scaffold_9 1418118 # Target Sequence Read... 1418118bp C+G = 47.8490% # Conservation Sequence: >Informant database(s): - # This is the 1-th best path. # Score: 122959 scaffold_9.fa iscan stop_codon 1033 1035 . - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan CDS 1036 1317 159 - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan CDS 1358 1444 87 - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan start_codon 1442 1444 . - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan start_codon 8013 8015 . + 0 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1"; scaffold_9.fa iscan CDS 8013 8040 110 + 0 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1"; scaffold_9.fa iscan CDS 8091 8146 93 + 2 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1"; scaffold_9.fa iscan stop_codon 8147 8149 . + 0 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1";

  6. Summary of TwinScan Results • 18,429 Genes Predicted • Max Length 12,878 nt Min Length 63 nt Avg Length 986.7 nt  945.4

  7. Matches to ESTs • ~ 1500 GENO & INRA ESTs have no matches in TwinScan predictions

  8. Annotation of Genes in Signaling Pathways

  9. GTP-binding proteins and related enzymes G protein coupled receptors (GPCR) heterotrimeric G-protein, a (GPa), b, g subunits monomeric G-proteins of the Ras small GTPases superfamily Ras Small GTPases Ras type (& Sos/Grb2 systems) Rho type, Rab type and Arf and Kir/Rem/Rad subfamilies & nuclear GTPase Ran 14-3-3 proteins Secondary messengers (generation of Phosphate-Inositides, PIP2/IP3; Diacylglycerol, DAG ; Ca2+; cAMP; …) Adenylate / Guanylate cyclases (AC) Phospholipases (Phospholipase C, PLC ; PL A2 and PL D) Phosphodiesterases (PDE) Calmoduline (CaM) Kinases Histidine kinase (HK) and Response regulator (RR) PDPK (proline directed Proteine Kinase, Ser-Pro & Thr-Pro) MAPKs (Mitogen Activated Protein Kinases – MAPKKK, MAPKK, MAPK) SAPKs (Stress Activated Protein Kinases) DYRKs (Dual Specific tyr-Phosphorylated and Regulating Kinases) CdKs (Cyclin dependent kinases) Non-PDPK PKA (cAMP-PK) PKC (Ca2+/CaM-PK) CaMKII Ser/Thr Phosphatases PP2A PP2B (Calcineurine), and others? PP2C Other PPases…? & Others… Ca2+ channels and transporters Signaling Protein Search List

  10. UAH Signaling Gene Annotation Process LACCARIA SCAFFOLDS BLAST DATABASE XDFORMAT SOURCE PROTEIN SEQUENCES TWINSCAN ANNOTATION SQL DATABASE TBLASTN • GPCR Database http://www.gpcr.org/7tm/ • Protein Kinase Resource http://www.kinasenet.org/pkr/ • NCBI BLAST HIT SQL DATABASE FIND OVERLAPPING HSPs

  11. INRA Annotation Process • Selection of genes was based on BlastP against L. bicolor eugene v00.2 with signalling protein sequences from: • Ustilago maydis • Magnaporthe grisea • Phanerochaete chrysosporium • and in some cases • Pisolithus microcarpus • Suillus bovinus • Candida albicans • Tuber borchii • Botrytis cinerea • Neurospora crassa • Aspergillus fumigatus. • Homologs were selected for their scores and e-value depending on the percentages of identities (>50%) and homologies (>65%) on a sufficient portion length considering the initial protein size • For a given function, the 1st hit listed below usually corresponded to an e-value of 0.0 or lower than e-70 with >85% identities.

  12. Adenylate / guanylate cyclases AC=> 1 gene scaffold_5_scaff.724 and 2 AC-like = scaffold_5_scaff.707 + scaffold_11_scaff.274 Phospholipases PI-specific PLC => 1 gene scaffold_6_scaff.206 + scaffold_6_scaff.209 & scaffold_6_scaff.209 PLD => scaffold_40_scaff.98 + scaffold_40_scaff.81 PLD, Phox-like => scaffold_3_scaff.869 14-3-3 proteins 14-3-3 => scaffold_3_scaff.430 Small G-protein Ras Ras (P. microcarpus ras and Ras1p S. bovinus) => scaffold_11_scaff.210 & scaffold_11_scaff.195 ; scaffold_11_scaff.196 & scaffold_11_scaff.185 ; scaffold_11_scaff.186 Ras (Ras2p S. bovinus) => scaffold_47_scaff.86 + scaffold_96_scaff.10 + scaffold_1_scaff.1164 + scaffold_47_scaff.137 Heterotrimeric GTP-binding proteins Gp-a (Gpa1 U. maydis) => scaffold_60_scaff_87 + scaffold_87_scaff_24 + scaffold_31_scaff_121 + scaffold_31_scaff_112 + scaffold_31_scaff_166 + scaffold_31_scaff_149 + scaffold_31_scaff_179 + scaffold_31_scaff_155 Gpa2 U. maydis => scaffold_57_scaff_31 ; Gpa3 U. maydis => scaffold_38_scaff_18 & scaffold_38_scaff_19 + scaffold_47_scaff_101 ; Gpa4 U. maydis => very low hits Gp-b => scaffold_1_scaff_681 ; scaffold_10_scaff_255 Gp-g => scaffold_2_scaff_833 & scaffold_2_scaff_834 e-value and scores very bad, but % id. and % pos. were really high and the anchoring site to beta subunit was present in sequence Signaling Genes versus Eugene v00.2

  13. Phosphatases PP2A => scaffold_8_scaff_112 PP2B / Calcineurine (Ca dependent ser/thr PPase) => scaffold_25_scaff_97 PP2C => very low hits (e-10) Kinases Protein kinase A (PKA) / cAMP-dependent PK => scaffold_4_scaff_881 Protein kinase C (PKC) => scaffold_3_scaff_687 2-components - histidine kinase => scaffold_34_scaff_64 MAP kinases MAPK (Pmk1 M. grisea & Kpp6, Ubc3, Kpp2 U. maydis) => scaffold_12_scaff_76 + scaffold_12_scaff_321 + scaffold_40_scaff_65 + scaffold_5_scaff_402 MAPK (Kpp4 U. maydis) => scaffold_2_scaff_982 MAPK (Ubc1 & Ubc2 U. maydis) => very low hits MAP kinase kinases MAPKK (Ste7/Ste11 & Fuz7 U. maydis) => scaffold_3_scaff_317 + scaffold_36_scaff_81 Signaling Genes versus Eugene v00.2 (con’t)

  14. Signaling Genes versus TwinScan Summary G PROTEIN GPCR KINASE

  15. GBA1_COPCO Phyologeny

  16. Summary and Conclusions • A TwinScan prediction of genes for the Laccaria bicolor scaffolds based on Cryptococcus neoformans has been completed • Number of genes midway between Eugene v00.1 and v00.2 • Does not include all of the EST data (i.e., there are some missing genes) • No further work is planned for the Laccaria genome project • A list of candidate signaling genes families has been prepared and the annotation is progressing • Results will be collated and merged into EMBL records

  17. Acknowledgements • Sébastien Duplessis • Jan Wuyts • Francis Martin • Pierre Rouze • Gopi Podila • NSF US Western Europe Cooperative Research Grant

  18. Backup Material

  19. G-Protein GBA1_COPCO Alignment

  20. GPCR Alignments

  21. GBA3_USTHO/MA

More Related