320 likes | 452 Views
Annotation of the Laccaria genome. Jan WUYTS INRA Tree-Microbe Interactions Unit INRA 54280 Champenoux, France VIB Department of Plant Systems Biology Ghent University, Technologiepark 927, 9052 Gent, Belgium jan.wuyts@psb.ugent.be. Presentation overview. Assembly EuGène
E N D
Annotation of the Laccaria genome Jan WUYTS INRA Tree-Microbe Interactions Unit INRA 54280 Champenoux, France VIB Department of Plant Systems Biology Ghent University, Technologiepark 927, 9052 Gent, Belgium jan.wuyts@psb.ugent.be
Presentation overview • Assembly • EuGène • Latest annotation • Clustering of genes • Ks distribution • Duplicated segments • Gbrowse database
Assembly 20050315 • map reads to assembly • BLASTn, E<=1e-50 • >= 97% sequence identity • calculate coverage for top15 scaffolds • which reads map to more than 1 location • discard
GC% 100% 50%
Coding potential search EuGene Blastn Blastx tBlastx EuGene • developed by the INRA (Toulouse, France) in cooperation with our group SplicePredictor Intrinsic approaches NetGene2 Netstart Predicted Genes (structural annotation) Extrinsic approaches Swissprot Cryptococcus Coprinus cDNA & EST
Collecting Data: Interpolated Markov Model Actual genes (full or partitial) Introns Intergenic regions were possible >gene1 atggctaggatagctctcgatagtcgat... >gene2 atggtccgcttcgctatgctagatcggat... >gene3 cgattagctgagctcttttctcgatcgtagct... >intron1 gtagctcgctgctcgag >intron2 gtagctcgataaaatcgctggggctcgctgag >intron3 gtagctgttttttcgctagctgatcgtttag >intergenic1 acgctgctgctcgggctcgctcgatcgatcccaaaatatcgctagatctagatcta... >intergenic2 gctcgatgagagatcgcgctcgctatataaatatcgcgatcgat...
Collecting Data: Splice Machine actual GT donors actual acceptors actual GC donors ...accgtgtGTgctttgt... ...cggtcgtGTccgaat... ... ...acttgtatAGgctgggt... ...cggtcgtAGaggaatc... ... ...actggatGCgcgtgca... ...ttgtcgtttGCaggaatc... ... pseudo GT donors pseudo acceptors pseudo GC donors ...tttcgtgtGTgctttgt... ...cgaacgtGTccaat... ... ...aattgtatAGgcccggt... ...aatacgtAGaggaatc... ... ...acccgatGCaacgtca... ...atgtcgggGCagggatc... ...
Predicting Genes Each signal on the sequence is scored using the SVM models GT Donors gt ac GC Donors gc gc Acceptors ag ct ...acgcgcgatagctgatggtcttttctcgcgagatctagagaggacacacatacatgatctagatcttaaa... 0.1 0.254 0.36 0.9 0.11 ...
Latest EuGène annotation • 23164 genes (18678 complete) • 9956 covered by EST for at least 100bp • 8929 match in swissprot, 11772 uniprot • 9232 match with Cryptococcus (4502 reciprocal best hit) • 12932 match with Phanerochaete (5515 reciprocal best hit)
23000 ?!? • 1176 match Class 1 TE, 1000 Class 2 • ~1500 tandem repeats • genes split by gaps in assembly (?) • genes split by annotation mistake (?) • false positives • most manually annotated genes look _very_ similar to EuGene annotation.
Predicted introns 73014 3622 76636
blastclust 1357 clusters max cluster: 21 genes 19831 single genes tight clusters, too strict, mostely very small proteins Li-Rost Single linkage clustering 2410 (2347) clusters max cluster: 224 genes 12194 (12008) single genes top clusters too relaxed Clustering predicted genes
Li & Rost top clusters • (224) Kinesin light chain (KLC) • (156) ?? • (126) ?? • (124) ?? • (119) Myosin heavy chain related • (115) Putative AC transposase
Ks distribution • Synonymous substitutions per synonymous site • “free” to mutate => follow molecular clock hypothesis • protein alignment -> codon alignment • indication for age of divergence
Duplicated regions • i-ADHoRe (automatic detection of homologous regions) by Cedric, Klaas, Yvan • reduce chromosomes (scaffolds) to strings of genes (no tandem duplicates) • map homologous genes (anchor points) • find statistically significant regions of colinearity
results i-ADHoRe • 52 multiplicons of 5 … 11 anchor points • 4.6% of Laccaria genome duplicated • no colinearity with Cryptococcus genome
median Ks scaffold_13 1.53 scaffold_39 scaffold_1 0.44 scaffold_1 scaffold_11 0.39 scaffold_30 scaffold_31 0.33 scaffold_6 scaffold_115 0.02 scaffold_39
Gbrowse database http://bioinformatics.psb.ugent.be/genomes/browse/gbrowse/laccaria
Thiamin pyrophosphate riboswitch • mRNA feature in 5’-UTR of mRNA • tertiary structure of mRNA has affinity for ligand (thiamine) • binding induces conformational change => regulation of translation • reported during previous workshop, now corresponding gene annotated
Lbscf0025g00410 Lbscf0025g00400