1 / 19

Comparing the EuGene annotation of the JAZZ & ARACHNE assemblies

Comparing the EuGene annotation of the JAZZ & ARACHNE assemblies. Jan WUYTS jan.wuyts@vib.be. Poplar proteins. Other At proteins. Other Plant proteins. SwissProt. Content potential for coding, intron and intergenic. Poplar IMM. PIR. Arabidopsis FLcDNA supported proteins. Poplar

Download Presentation

Comparing the EuGene annotation of the JAZZ & ARACHNE assemblies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing the EuGene annotation of the JAZZ & ARACHNE assemblies Jan WUYTS jan.wuyts@vib.be

  2. Poplar proteins Other At proteins Other Plant proteins SwissProt Content potential for coding, intron and intergenic Poplar IMM PIR Arabidopsis FLcDNA supported proteins Poplar RepBase Poplar cDNA & EST join(9265..9395,9749..99342). complement(join(10164..10295,10349..10420,10467..10514,10566..10626,10681..10770,10823..10949,11001)) TBlastx Blastn Blastx RepeatMasker SpliceMachine Extrinsic modules Genome Sequence Gene Models Arabidopsis genome ATCCGTAAGATGGTGCGATGCCCTAAATGGGTCGGTTTATAAAGGCGCGTAGGTAAGTGCAATTTATTCTTCAAGTTCCGAATTTTATATGCGCATATCGTCAGTTCTTCTGTTGCAGTTGGCGCACTTGGACTACCTGCAATTTATTCTTCAAGTTCCGAATTTTATAT EuGene DAG Splice Sites Start ATG Translation Start Site prediction Output Input Intrinsic modules

  3. EuGene annotation • same parameters as last version of previous assembly • TE masking: • 84 TE in previous version • 290 now (thanks Hadi Quesneville!) • annotated genes: 20614 (Jazz) 18578 (Arachne)

  4. fraction of nucleotides in window of 25000 nt. annotated as coding 100% 0% 0 6,8M

  5. size distribution • compare Jazz and Arachne assemblies • fraction confirmed by ESTsCDS covered by at least 200(100) nt. with %id >= 95% • fraction confirmed by BLASTp to uniprot protein covered for >= 75% of it’s length with blastp hit (e<=1e-5)

  6. Small peptides • ~50% have corresponding ESTs, but no homolog in uniprot • what are they? • split genes? • Pseudogenes? • mis-assembled regions (artificial duplications)? • TE-derived sequences? • non-coding RNAs (anti-sense regulation)?(would these be sequenced as ESTs? poly-A?) • other??

  7. best reciprocal BLAST hits other Homo sapiens Arabidopsis thaliana Dictyostelium discoideum Schizosaccharomyces pombe Coprinus cinereus Cryptococcus neoformans Yarrowia lipolytica Magnaporthe grisea Neurospora crassa Emericella nidulans Aspergillus fumigatus Gibberella zeae total:5323 Ustilago maydis

  8. Collinearity of the 2 assemblies • !! next slides have no biological relevance !! • how do both assemblies compare? • jazz  (pasting)  arachne • jazz  (cutting & pasting)  arachne • identify collinear region using ADHoRe • don’t take orientation into account

  9. pasting Jazz Arachne block length ~ #annotated genes

  10. cutting & pasting Jazz Arachne block length ~ #annotated genes

  11. cutting & pasting Arachne Jazz Jazz Arachne block length ~ #annotated genes

  12. Acknowledgements • Stephane Rombauts • Piere Rouzé • Yves Van de Peer • everybody in the bioinformatics group in Gent • Francis Martin • everybody from the research group in Nancy • All manual annotators

More Related