Genome Annotation. What we are going to discuss. Finding RNA-only genes Gene prediction Prokaryotes vs. eukaryotes Introns and exons Transcription signals ESTs Functional annotation Biochemical pathways and subsystems Metabolic reconstruction of whole organisms. Genome Overview.
Sn = TP / (TP + FN)
Sp = TP / (TP + FP)
Human on left, Arabidopsis on right
Very simple HMM: each base is either in an intron or an exon, and gets emitted with different frequencies depending on which state it is in.
Genemark scoring of the likelihood each nucleotide is in an intron, based on HMM.
A more realistic model from SNAP
At: Arabidopsis thaliana; Ce: Caenorhabditis elegans; Dm: Drosophila melanogaster; Os: Oryza sativa
Sequence logos around (b) the intron slice donor site (usually GT) and (c) the ATG translation start codon, in four well-studied eukaryotes.
Several factors used to score
promoter sequences. This is
part of a neural network
model, but the factors are
common to many programs.