MAKER Annotation Process
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

MAKER Annotation Process Example of Glossina PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

MAKER Annotation Process Example of Glossina. Karyn Mégy. Dan Hughes. VectorBase http://www.vectorbase.org. Annotation: aims and means. Aims Preliminary Locus rather than exact position Means Automatic annotation By similarity Ab initio Manual annotation By regions

Download Presentation

MAKER Annotation Process Example of Glossina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Maker annotation process example of glossina

MAKER Annotation Process

Example of Glossina

Karyn Mégy

Dan Hughes

VectorBase

http://www.vectorbase.org


Annotation aims and means

Annotation: aims and means

  • Aims

    • Preliminary

    • Locus rather than exact position

  • Means

    • Automatic annotation

      • By similarity

      • Ab initio

    • Manual annotation

      • By regions

      • By gene families


Annotation similarity vs ab initio

Annotation: similarity vs. ab initio

  • Similarity

    • Similarity to known sequences

      -> only know genes

      -> based on available data (qty, qlty)

  • Ab initio

    • Follow a gene “recipe”

      -> potentially identify new genes

      -> over predictions


Ensembl annotation

Ensembl annotation

1

CommunityAnnotation

2

Proteinspecies specific

3

Transcriptomespecies specific

Maskedgenome

Rawgenome

4

Protein‘close’ specific

Masking: RepeatModeler repeats + known repeats/transposons

5

Ab initio

5

5

5

5

5

5

4

4

4

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

Add UTRs (ESTs)

Functional annotation

ncRNAs prediction

Pseudogene prediction

GenBank submission

4

Protein‘Close’ species

4

Protein‘Close’ species

MASKEDgenome sequence

MASKEDgenome sequence

MASKEDgenome sequence

MASKEDgenome sequence

MASKEDgenome sequence

MASKEDgenome sequence

MASKEDgenome sequence

4

Protein‘Close’ species

Raw genome sequence

Raw genome sequence

Raw genome sequence

Raw genome sequence

Raw genome sequence

Raw genome sequence

Raw genome sequence

Protein‘Close’ species

Protein‘Close’ species

Protein‘Close’ species

3

3

3

3

Transcriptomespecies specific

3

Transcriptomespecies specific

3

Transcriptomespecies specific

Transcriptomespecies specific

Transcriptomespecies specific

Transcriptomespecies specific

2

2

2

2

Proteinspecies specific

2

Proteinspecies specific

2

Proteinspecies specific

Proteinspecies specific

Proteinspecies specific

Proteinspecies specific

1

1

1

Communityannotation

1

Communityannotation

1

Communityannotation

1

Communityannotation

Communityannotation

Communityannotation

4

3

1

1

5

2

4

3

1

1

5

2

4

3

1

1

5

2

4

3

1

1

5

2

4

3

1

1

5

2

4

3

1

1

5

2


Ensembl annotation1

Ensembl annotation

  • Similarity-focused

  • Data rich organisms

  • Fiddly, time consuming

  • Rhodniusprolixus experience

  • In the meantime:

    Heliconius annotation using MAKER


Maker

MAKER

  • Aim:

    • Generate gene sets

    • Combine into final gene set

  • Iterative process

Rawgenome

Annotatedgenome

DATA

DATA

DATA

  • http://www.yandell-lab.org/software/maker.html

  • Cantarel et al. Gen. Res. 2008. PMID 18025269


Maker1

MAKER

  • Aim:

    • Generate gene sets

    • Combine into final gene set

  • Iterative process

Rawgenome

Annotatedgenome

DATA

DATA

DATA


Intermediate gene sets

Intermediate gene sets

  • ESTs

    • from GenBank

    • cleaned and clustered/assembled with CAP3

    • 71,700 contigs

  • Insecta/metazoa proteins

    • from UniProt

    • align to the genome with BLAST

    • 690,000 seqces (insecta)

    • 2,200,00 seqces (metazoa)

Raw data

Maskedgenome

Rawgenome

Masking: RepeatModeler repeats + known repeats/transposons


Intermediate gene sets1

Intermediate gene sets

  • RNAseq Illumina Yale

  • - cleaned

  • - aligned to the genome using Tophat/Bowtie

  • - build ‘tranfrag’ with Cufflinks

    • 78,000 ‘transfrag’ (on 4 sets -> overlaps)

  • Augustus

  • - generated by Martin Swain

  • - trained with SOLiD data

    • 16, 963 models – high quality

Raw data

Gene models

Maskedgenome

Rawgenome

Masking: RepeatModeler repeats + known repeats/transposons


Intermediate gene sets2

Intermediate gene sets

  • ESTs – aligned to the genome

    • from GenBank – clustered with CAP3

    • 71,700 clusters

  • Insecta/metazoa proteins (UniProt)

    • 690,000 seqces (insecta)

    • 2,200,00 seqces (metazoa)

Raw data

  • RNAseq Illumina Yale– using Tophat/Cufflinks

    • 78,000 ‘transfrag’ (on 4 sets -> overlaps)

  • Augustus – SOLiD data trained

    • 16, 963 models – high QC

Gene models

Maskedgenome

Rawgenome

Masking: RepeatModeler repeats + known repeats/transposons

  • SNAP – trained for Glossina (MAKER)

  • Augustus – trained for Glossina (Martin Swain)

  • - GenScan

Ab initio


Intermediate gene sets3

Intermediate gene sets

Raw data

Gene models

Maskedgenome

Rawgenome

Masking: RepeatModeler repeats + known repeats/transposons

Ab initio


Maker2

MAKER

ESTs

Raw data

Proteins

Gene models

Maskedgenome

Rawgenome

Provided as input

Masking: RepeatModeler repeats + known repeats/transposons

Ab initio

Run software within MAKER


Maker iterative process

MAKER – iterative process

  • Round-1:

    • Align ESTs and Insecta proteins to the genome

    • Train SNAP (1): Drosophila HMM

      ESTs and protein alignments,

      RNA-seq Illumina Yale, Augustus (SOLiD)

  • Round-2:

    • Re-train SNAP (2) – same as above but HMM = output of SNAP-1

  • Round-3:

    • Re-train SNAP (3) – same as above but HMM = output of SNAP-2

    • Align Metazoa proteins to the genome

    • Combine final gene set


Using maker for

Using MAKER for…

Heliconius

Tsetse fly

Salmon louse

Centipede


Maker annotation process example of glossina

Annex…


Augustus solid

Augustus (SOLiD)

  • Glossina trained:

    • > ESTs only: 14,739 predictions,

    • 9.8% with similarity to Gl. proteins (1,455 seq., 95% seq. identity)

    • -> ESTs + SOLiD: 14,739 predictions,

    • 9.9% with similarity to Gl. proteins (1,465 seq., 95% ID)

    • -> Glossina GenBank proteins: 2,754 proteins sequences

    • 53% matching Augustus models

  • Glossina un-trained:

    • -> 8,581 predictions, 15% with similarity to Gl. proteins (1,299 seq., exact matches)

Martin Swain’s stats, July 22nd, 2011


Maker annotation process example of glossina

ESTs

  • Total: 79,292 ESTs


Maker annotation process example of glossina

  • [1] Adult midgut expressed sequence tags from the tsetse fly Glossina morsitans morsitans and expression analysis of putative immune response genes. Genome Biol. 2003. Lehane et al.

  • [2] Differential expression of fat body genes in Glossina morsitans morsitans following infection with Trypanosoma brucei brucei. Int. J. Parasitol. 2008. Lehane et al.

  • [3] Analysis of fat body transcriptome from the adult tsetse fly, Glossina morsitans morsitans. Insect Mol. Biol. 2006 Attardo et al.

  • [4] Functional Characterisations of odorant binding proteins and chemosensory proteins in tsetse fly Glossina morsitans morsitans. Unpublished 2009. …., Lehane,M., Hertz-Fowler,C., Berriman,M., …

  • [5]Comprehensive analysis of the transcriptome of the Tsetse fly Glossina morsitans morsitans. Unpublished. 2009. Hertz-Fowler,C., Aslett,M.A. and Berriman,M.EST submitted under: GenomeProject:9563


Maker final gene set

MAKER – final gene set

  • Genes:

    • Final genes: 12,220

    • Raw data:

      • EST-based genes: 23,469

      • Protein-based genes : 416,9591 (redundancy)

    • Gene sets:

      • Illumina-Yale: 70,915 (redundancy)

      • Augustus (SOLiD): 16,155

    • Ab initio

      • SNAP: 48,464

      • Augustus (MAKER): 14,413

(417,000)


  • Login