slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Curation in IMG-ER Natalia Ivanova MGM Workshop September 28, 2011 PowerPoint Presentation
Download Presentation
Data Curation in IMG-ER Natalia Ivanova MGM Workshop September 28, 2011

Loading in 2 Seconds...

play fullscreen
1 / 14

Data Curation in IMG-ER Natalia Ivanova MGM Workshop September 28, 2011 - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Data Curation in IMG-ER Natalia Ivanova MGM Workshop September 28, 2011. Tricky question. What do you need to do data curation in IMG? I-phone PhD in Computer Science supernatural powers Correct answer: you need an IMG account http://img.jgi.doe.gov/er. Gene models Add a gene

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Curation in IMG-ER Natalia Ivanova MGM Workshop September 28, 2011' - trevor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Data Curation in IMG-ER

Natalia Ivanova

MGM Workshop

September 28, 2011

tricky question
Tricky question
  • What do you need to do data curation in IMG?
    • I-phone
    • PhD in Computer Science
    • supernatural powers
  • Correct answer: you need an IMG account

http://img.jgi.doe.gov/er

what can be curated in img er
Gene models

Add a gene

Make a gene pseudogene or “obsolete” (=delete it)

2. Functional annotations:

Product names

EC numbers

Gene symbols

If you believe something else needs to be changed (genome name, taxonomy, etc.) – please use IMG Questions/Comments link

What can’t be changed: automated assignments to protein families (Pfam, COGs, TIGRfam, InterPro, SEED assignments, KO assignments)

What can be curated in IMG-ER?
slide5
Product Name is free text (but see GenBank requirements http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit_annotation.html)

Prot Description is free text (goes to “note” in GenBank submission)

EC number and PUBMED ID – see explanation

Notes are free text (goes to “note” in GenBank submission)

Gene symbol is “gene name” – 4 letter abbreviation; goes to “gene” in GenBank submission

how to find the genes that need curation
Two possible scenarios:

You have submitted a genome to IMG-ER and want to have the best annotations possible for it (e. g. for GenBank submission)

You’re an expert and know everything about a certain protein family (families) = “community service”

How to find the genes that need curation?
curation of genome annotations
Curation of genome annotations
  • “Hypothetical protein”, but with some evidence
  • Non-hypothetical protein, but no evidence

Compare Gene Annotations

add to Gene Cart

review Gene Pages

find genome

Genome Statistics

refine gene set

  • Find Genomes:
  • Genome Browser
  • Genome Search

w/o enzymes but with candidate KO based enzymes

  • Protein families
  • Homologs/orthologs
  • Gene Neighborhoods
why do you want to review annotations
Most IMG pipelines are optimized for specificity, so they are more likely to have false negatives, but generate few false positives

Compare Annotations

Product name is a consensus of multiple assignments: BLASTp, TIGRfam, COG, Pfam

Sources of false negatives - cutoffs: TIGRfam trusted cutoffs are quite stringent; COG doesn’t have trusted cutoffs; BLASTp cutoff of 50% identity

Candidate genes with KO annotations – sources of false negatives

Cutoffs for % identity and alignment length

Why do you want to review annotations?
curation of annotation in one genome or a set of genomes
Your favorite genes (experimental verification, etc.) -> use Find Genes, Gene Search or BLAST

“Compare Annotations” on Organism Details page

“Candidate genes with KO annotations” on Organism Details page

PhyloProfiler

Curation of annotation in one genome (or a set of genomes)
example of a missed gene
Run PhyloProfiler of Deinococcus geothermalis as a query, Deinococcus hopiensis as target (with no homologs in)

Select Dgeo_0119 as a sequence to check whether a homolog of this gene was missed in Deinococcus hopiensis

Example of a missed gene
adding missed genes contd
Use graphical viewer to check the translation

Adjust the start if other start codons with better RBS exist upstream

Adding missed genes - contd
img curation exercises
Go to the link in the usual place:

http://genomebiology.jgi-psf.org/Content/MGM-10.Sep2011/agenda.html

The first 2 pages – questions without answers; the rest is cheat sheet

IMG curation exercises