1 / 27

The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease

Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease Gene content in the Mlp Genome ( automated annotation). Duplessis Sébastien (INRA Nancy).

Download Presentation

The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MlpSummer workshop – INRA Nancy, August 20-21 2008 The genomesequence of Melampsoralarici-populina the causal agent of the poplarrustdisease Gene content in the MlpGenome (automatedannotation) Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM

  2. Coding potential search EuGene,FGeneSH, Genewise Blastn Blastx tBlastx Annotation of MlpGenome – Gene prediction 2006-2007 SpliceMachine Intrinsic approaches Repeats Netstart Predicted Genes (manual curation) Extrinsic approaches Puccinia Sporobolomyces Basidiomycetes MlpESTs Swissprot

  3. MlpGenomeProject – Summer 2007 Pre-release of Mlpgenomeassembly (16.4% gaps – Assembledwith JAZZ) Main genomescaffold total: 2,682 ESTsfrom 50/50 spores and germtubes of Mlp 98AG31 INRA Nancy => ~4,000 (2004) JGI => ~60,000 (2007) => ~52,000 ESTs ESTsfrom spores and germlings of MelamsporaSpp. [Mlp, Mmd, Mmt, Mo] CFS Laval => ~3,000 Mlp / ~4,200 Mmd / ~3,000 Mo / ~3,000 Mmt In planta ESTsfromMlp haustoria=> ~1,700 Mlp H3B => ~15,000 ESTs

  4. Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant) Blast againstMlpscafolds Blast againstMlpESTs Blast againstavailablebasidiomycetegenomes

  5. Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant) • Files to help in annotation usingArtemis • => fasta ofgenomescaffolds • => gff files ofESTsclusters • => gff files of blastn Hits vs. Puccinia, Sporobolomyces& Ustilagogenemodels

  6. Annotation of FL sequences = TRAINING SET for genepredictors (EuGene, fgenesh, ) Gene models annotation based on complete EST support & Homology Coding for know ubiquitousfunction (metabolism, cytoskeletonelements…) Coding for hypotheticalproteins and new genes? Coding for proteins of various size Mannual curationperformedwithArtemis(Nancy & Québec) => 348 GM curated Edition of annotation cards => MelampsoraGenome Consortium website

  7. TRAINING SET for geneprediction (EuGene, fgenesh, ) => 348 GM curated => 52,269 ESTsfromMlp 98AG31 => raw TE predictionbased on Mlpgenomepre-release

  8. JGI Gene prediction (AndreaAerts– Jan-Mar/2008 ) • 39 scaffolds (43.9 Mbp) • 409 repetitive elements provided by collaborator , 87 generated in pipeline • nr: N.crassa, M.grisea, F.graminearum • ESTs • 3941 uniseqs described in 2003 paper • 6318 uniseqs described in 2008 paper • 8799 JGI cluster consensi (includes external ESTs) • 5 C.parasiticaCDSs from NCBI

  9. Outputs

  10. What do the genes look like?

  11. How were the genes predicted?

  12. How good are the genes?

  13. KOG assignments

  14. KEGG assignments

  15. Prediction of Gene ModelsusingEuGene (VIB - Ghent) Annotation performedwithMlpgenomepre-release M-P Oudot Le Secq - Eugene annotation usingLaccariabicolorannotation parameters => ~ 17,000 Mlpgenemodels (<1,500 TEs) => Mlp GM v0.0 Yao-Cheng Lin - Eugene annotation usingparametersspecificallydefined for M. larici-populina => ~9,000 Mlpgenemodels (> 200aa) Annotation performedwithMlpgenomeassembly release Jan2008 Yao-Cheng Lin - EuGeneannotation usingspecific training for M. larici-populina => 12,386 Mlpgenemodels 4308 hits vs yeast 4899 hits againstUniprot (7487 no hits - 1/3 ; 2/3) 4708 supported by ESTs Yao-Cheng Lin – Last EuGene annotation (summer 2008) including 454 data (~ 5000 contigs) and adjustedparameters for smallsecretedproteinsprediction => 17,167Mlpgenemodels (6,989 < 300aa)

  16. JGI Gene prediction (AndreaAerts– 03/28/2008 ) • Genewise – 9193 models • Fgenesh_pm 3147 models • estExt_fpm 2438 models + EuGene Prediction Reconciliation and release in April 2008

  17. JGI Gene Models prediction 16,694 gene models predicted by JGI predictions (& EuGene) • Prediction method: • Ab initio: 51 % • EuGene: 27 % • Homology based: 14 % • EST based: 8 % • Gene Model validation: • Complete (5'M-3'*): 94 % • Alignment with nr: 43 % • Alignment with pfam: 25 % • EST support: 27 % 16694 Gene models 4465 EuGene models (27%) 4810 fgenesh1 (29%) + 5422 fgenesh2 (32%) => 65.5% fgenesh models 1997 Genewise/GenewisePlus models (12%) 21% of fgenesh/genewise models were consolidated with EST Extension

  18. JGI Gene Models prediction 16,694 gene models predicted by JGI (& EuGene) Mean gene length: 1685 pb (Laccaria: 1.5 kb)‏ Mean transcript length: 1224 b (Laccaria: 1.1 kb)‏ Exon # / gene: 4.90 (Laccaria: 5.4)‏ Mean exon size: 250 pb (Laccaria: 210 pb)‏ Mean intron size: 120 pb (Laccaria: 93 pb)‏ Mean protein size: 378 (Laccaria: 367 aa)‏ Protein length < 300 aa — Laccaria: 52%, Coprinus: 40% — Melampsora: 49%, Puccinia: 54%

  19. JGI Gene Models prediction –Introns donors and acceptors

  20. Gene Models density on the 20 largest scaffolds Mean gene density of 2.04/10kb => 1 gene /4.9 kb (Laccaria1 gene / 3.1 kb)

  21. JGI Gene Models prediction – The Mlpgenespace 28% of the genomeiscodingsequence 16,694 putative proteins (genemodels) = JGI prediction + extra putative proteinsidentifiedwithEuGene 15,725 proteins > 100 AA Laccaria >17,000 Phanerochaete 10,048 Coprinopsis 8,759 Ustilago 6,522 7,830 withhomologs in nr (47%) including3,893hypotheticalproteins (Puccinia, Laccaria, mostly basidiomycete) 5,461 withhomologs in swissprot(33%) 6,820 withhomologs in Laccaria (41%) 4,507 supported byMlpESTs (27%) A large proportion (30%) ofMlpgenes do not have homologues inotherfungalgenomesincludingPuccinialesP. graminisand Sporobolomycesroseus

  22. Blast vs. Other fungal deduced proteomes 33% of Melampsoralarici-populinaspecific Gene Models (5,500 models with no homologs but ~300 Pfam/IPR hits) 10,344 homologs in P.graminis (62%) ~ 25% of orthologs with P.graminis

  23. Mlp gene models functional classification

  24. GO classification: 27.8%

  25. KEGG pathways:2758 gene models (16.5%)

  26. JGI summary – A complete table to help in annotating Mlp gene models

  27. Mlp 98AG31 the 'bad guy' genomic team at INRA UMR 1136 IAM Duplessis Sébastien & Francis Martin Emilie Tisserant & Benoît Hilselberger (INRA Nancy) MlpBioinfo Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families Marie-Pierre Oudot-Le Secq (INRA Nancy) early EuGene gene prediction

More Related