1 / 32

Annotation of the Laccaria genome

Annotation of the Laccaria genome. Jan WUYTS INRA Tree-Microbe Interactions Unit INRA 54280 Champenoux, France VIB Department of Plant Systems Biology Ghent University, Technologiepark 927, 9052 Gent, Belgium jan.wuyts@psb.ugent.be. Presentation overview. Assembly EuGène

Download Presentation

Annotation of the Laccaria genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotation of the Laccaria genome Jan WUYTS INRA Tree-Microbe Interactions Unit INRA 54280 Champenoux, France VIB Department of Plant Systems Biology Ghent University, Technologiepark 927, 9052 Gent, Belgium jan.wuyts@psb.ugent.be

  2. Presentation overview • Assembly • EuGène • Latest annotation • Clustering of genes • Ks distribution • Duplicated segments • Gbrowse database

  3. Assembly 20050315 • map reads to assembly • BLASTn, E<=1e-50 • >= 97% sequence identity • calculate coverage for top15 scaffolds • which reads map to more than 1 location • discard

  4. assembly coverage

  5. assembly coverage (multihit removed)

  6. GC% 100% 50%

  7. Coding potential search EuGene Blastn Blastx tBlastx EuGene • developed by the INRA (Toulouse, France) in cooperation with our group SplicePredictor Intrinsic approaches NetGene2 Netstart Predicted Genes (structural annotation) Extrinsic approaches Swissprot Cryptococcus Coprinus cDNA & EST

  8. EuGene

  9. Graphical output of EuGene

  10. Collecting Data: Interpolated Markov Model Actual genes (full or partitial) Introns Intergenic regions were possible >gene1 atggctaggatagctctcgatagtcgat... >gene2 atggtccgcttcgctatgctagatcggat... >gene3 cgattagctgagctcttttctcgatcgtagct... >intron1 gtagctcgctgctcgag >intron2 gtagctcgataaaatcgctggggctcgctgag >intron3 gtagctgttttttcgctagctgatcgtttag >intergenic1 acgctgctgctcgggctcgctcgatcgatcccaaaatatcgctagatctagatcta... >intergenic2 gctcgatgagagatcgcgctcgctatataaatatcgcgatcgat...

  11. Collecting Data: Splice Machine actual GT donors actual acceptors actual GC donors ...accgtgtGTgctttgt... ...cggtcgtGTccgaat... ... ...acttgtatAGgctgggt... ...cggtcgtAGaggaatc... ... ...actggatGCgcgtgca... ...ttgtcgtttGCaggaatc... ... pseudo GT donors pseudo acceptors pseudo GC donors ...tttcgtgtGTgctttgt... ...cgaacgtGTccaat... ... ...aattgtatAGgcccggt... ...aatacgtAGaggaatc... ... ...acccgatGCaacgtca... ...atgtcgggGCagggatc... ...

  12. Predicting Genes Each signal on the sequence is scored using the SVM models GT Donors gt ac GC Donors gc gc Acceptors ag ct ...acgcgcgatagctgatggtcttttctcgcgagatctagagaggacacacatacatgatctagatcttaaa... 0.1 0.254 0.36 0.9 0.11 ...

  13. Latest EuGène annotation • 23164 genes (18678 complete) • 9956 covered by EST for at least 100bp • 8929 match in swissprot, 11772 uniprot • 9232 match with Cryptococcus (4502 reciprocal best hit) • 12932 match with Phanerochaete (5515 reciprocal best hit)

  14. 23000 ?!? • 1176 match Class 1 TE, 1000 Class 2 • ~1500 tandem repeats • genes split by gaps in assembly (?) • genes split by annotation mistake (?) • false positives • most manually annotated genes look _very_ similar to EuGene annotation.

  15. peptide length

  16. coding density

  17. Predicted introns 73014 3622 76636

  18. blastclust 1357 clusters max cluster: 21 genes 19831 single genes tight clusters, too strict, mostely very small proteins Li-Rost Single linkage clustering 2410 (2347) clusters max cluster: 224 genes 12194 (12008) single genes top clusters too relaxed Clustering predicted genes

  19. Li & Rost top clusters • (224) Kinesin light chain (KLC) • (156) ?? • (126) ?? • (124) ?? • (119) Myosin heavy chain related • (115) Putative AC transposase

  20. Ks distribution • Synonymous substitutions per synonymous site • “free” to mutate => follow molecular clock hypothesis • protein alignment -> codon alignment • indication for age of divergence

  21. Ks distribution Laccaria

  22. Ks distribution Phanerochaete

  23. Duplicated regions • i-ADHoRe (automatic detection of homologous regions) by Cedric, Klaas, Yvan • reduce chromosomes (scaffolds) to strings of genes (no tandem duplicates) • map homologous genes (anchor points) • find statistically significant regions of colinearity

  24. Simillion et al. 2004 Genome Research

  25. results i-ADHoRe • 52 multiplicons of 5 … 11 anchor points • 4.6% of Laccaria genome duplicated • no colinearity with Cryptococcus genome

  26. Age of duplicated blocks

  27. median Ks scaffold_13 1.53 scaffold_39 scaffold_1 0.44 scaffold_1 scaffold_11 0.39 scaffold_30 scaffold_31 0.33 scaffold_6 scaffold_115 0.02 scaffold_39

  28. Gbrowse database http://bioinformatics.psb.ugent.be/genomes/browse/gbrowse/laccaria

  29. Thiamin pyrophosphate riboswitch • mRNA feature in 5’-UTR of mRNA • tertiary structure of mRNA has affinity for ligand (thiamine) • binding induces conformational change => regulation of translation • reported during previous workshop, now corresponding gene annotated

  30. Lbscf0025g00410 Lbscf0025g00400

More Related