slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
the GO Reference Genome Annotation Project PowerPoint Presentation
Download Presentation
the GO Reference Genome Annotation Project

Loading in 2 Seconds...

play fullscreen
1 / 1

the GO Reference Genome Annotation Project - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Cross-species Review Comparing annotations across species helps: Show terms that can be added by via experimental data in orthologs. Ensure annotation consistency e.g. by spotting outliers that may reflect curation errors. Reveal significant biological differences between species.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'the GO Reference Genome Annotation Project' - truda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Cross-species Review

  • Comparing annotations across species helps:
  • Show terms that can be added by via experimental data in

orthologs.

  • Ensure annotation consistency e.g. by spotting outliers

that may reflect curation errors.

  • Reveal significant biological differences between species.

Detail from graph used for annotation summary and

comparison purposes showing some of the GO biological

process terms annotated to the human gene MSH2 and

predicted orthologs. Graphs for all genes curated in this

project are available at the GO website:

www.geneontology.org/images/RefGenomeGraphs

35000

30000

200

25000

150

20000

Number of Genes

15000

100

10000

50

5000

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

S. pombe

S. cerevisiae

D. discoideum

D.

melanogaster

C. elegans

A. thaliana

M. musculus

R. rattus

H. sapiens

D. rerio

S. pombe

S. cerevisiae

D. discoideum

D.

melanogaster

C. elegans

A. thaliana

M. musculus

R. rattus

H. sapiens

D. rerio

5.3K

6.1K

12K

13.9K

22.9K

27.3K

27.9K

28K

28K

28K

Organism, total gene number and GO aspect

the GO Reference Genome Annotation Project

Susan Tweedie, Rex Chisholm, Karen Christie, Emily Dimmer, Mary E. Dolan, Pascale Gaudet, David P. Hill, Doug Howe, Jim Hu, Donghui, Li, Ruth Lovering, Fiona McCarthy,

Sohel Merchant, Victoria Petri, Kimberley Van Auken, Valerie Wood, Suzanna Lewis, Michael Ashburner, J. Michael Cherry, Judy A. Blake, and The Gene Ontology Consortium.

Summary

The GO Reference Genome Annotation Project is a collaboration between model organism databases

representing 12 diverse species. Our aim is to provide comprehensive high quality GO annotation

for every gene in each species. This will serve as a valuable reference set for annotating other

genomes.

Our strategy is to work together, curating the same genes simultaneously from an agreed list.

This poster illustrates the process we follow and highlights some of the curation issues that

we have faced.

ZFIN

E. coli

  • What genes to curate first?
  • For the first year of the project we chose orthologs of
  • human disease genes (taken from the OMIM collection) as
  • our priority targets for curation.
  • We have now expanded our priority targets to 4 areas:
  • Orthologs of human disease genes
  • Genes involved in metabolic pathways
  • Topical or ‘hot’ genes
  • Genes that currently lack GO annotation but are

conserved from yeast to human

  • We try to curate related genes in batches to promote
  • curation efficiency. MODs will now take turns to choose
  • genes for curation.

Overview

Get list of 20 genes

to curate/month

Chicken

What are the related genes in my species?

Currently each MOD has its own method of ortholog

identification. These include: YOGY, InParanoid, OrthoMCL,

TreeFam, Homologene and in-house sequence analysis.

Unfortunately, none of these cover all 12 reference genome

species and there are problems comparing methods such as

identifier variation and different update frequencies.

We are now working with Kara Dolinski at Princeton to establish

a consistent system for representing homologs / orthologs

across the reference genome species set.

Identify orthologs

Record ortholog details

Triage papers for GO

Making a database

Target genes, orthologs and curation status are currently

recorded in shared google spreadsheets. This has proved

inconvenient so we are developing a dedicated database to

store this information. A prototype is shown below.

Curate selected papers

for new GO annotations

What papers to curate?

Identifying the relevant papers and prioritizing them for GO

curation can be a rate limiting step for MODs that do not have a

literature triage system in place, particularly when there are many

papers about a gene.

In some cases, working back to the primary literature from

recent reviews is an effective approach. Another strategy is the

use of text mining tools such as TextPresso.

Discuss annotations

with other curators

Create new GO terms

as required

Clean-up existing

GO annotations

Ontology Development

Working on the same genes together encourages the

development of new GO terms. Over the last year

450 new GO terms have been added by the reference

genome annotation group.

Review annotations by

other ref genome MODs

  • Review annotation quality
  • Annotations should conform to the agreed standards
  • of the reference genome annotation group:
  • Experimental evidence (evidence codes IDA, IPI, IGI,

IMP, IEP) is preferred.

  • Terms assigned by TAS (traceable author statement)

should be traced to the primary literature - these don’t

always turn out to apply to the correct species! TAS is

discouraged for use in this project.

  • Terms should only be assigned by sequence similarity

(ISS) where the terms are supported by experimental

evidence for the similar sequence.

  • Non-traceable author statements should be avoided.

Release annotation set

Data availability

The annotations from this project are submitted

to GO as part of the standard gene association

file available from the GO web site or via AmiGO.

Efforts to highlight this data set in AmiGO and via

dedicated web pages are in progress.

Progress summary

Over 200 human disease genes have been examined by

the group in the last year.

A comparison of the categories of evidence codes used to assign

GO terms to genes indicates generally higher proportions of

experimental evidence (shown in pale blue) in the reference

genome target set (left graph) versus GO annotation across

all genes (right graph).

Number of Reference Genome Target Genes

Organism and GO aspect (P= Biological Process, F=Molecular Function, C=Cellular Component)