1 / 22

Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Automated Annotation of Microbial Genomes, Opportunities and Pitfalls. Margie Romine Pacific Northwest National Laboratory Richland, Washington. Shewanella oneidensis MR-1. Breathes Mn & Fe and other metals thereby changing their solubility

blake-house
Download Presentation

Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Annotation of Microbial Genomes, Opportunities and Pitfalls Margie Romine Pacific Northwest National Laboratory Richland, Washington

  2. Shewanella oneidensis MR-1 • Breathes Mn & Fe and other metals thereby changing their solubility • Also reduces radionuclides and hence impacts their mobility at contaminated sites • Genome sequenced by the Institute for Genome Research in 2002 (funded by DOE-OBER) • Can we now better determine how this organism interacts with metals and radionuclides?

  3. Shewanella spp. Inhabit Many Niches 2 more were sequenced by DOE’s Joint Genome Institute and 14 more are under way! Energy rich - fermentation is occurring and energy is continuously being deposited via sedimentation Rapidly changingredox conditions/dominant electron acceptors Microbial partners are present to remove the acetate produced via anaerobic respiration.

  4. Bacterial Genome Sequencing Explodes • 341 completed genomes, 976 ongoing • Partial genome sequences released in just days now by JGI! • How do we use sequence information to understand how all these organisms function in the environment? • Annotation is the key, but is now largely automated and hence of lower quality

  5. What is Annotation? AGCTTAACTGGGATACGACGACCAGTAGACAGGTRTACGATGAGATATATAT Locate genes Translate to proteins MASDLKKIYTRPRPDSAWQECVAALFDGHSKDKLACNDDL Gather Evidence of function Assign putative functions

  6. Annotation Drives Post-genomic Research Function predictions Methodologies Data Interpretation Gene predictions DNA microarrays mRNA expression Metabolic modeling ChiP-Chip DNA binding sites Protein predictions Proteomics Protein expression Hypothesis Targeted gene knock-outs

  7. Annotation with Gnare/Puma2 • Developed at Argonne National Laboratory by Natalia Maltsev, Mark D’Souza, Elizabeth Glass, Dina Sulakhe, Mustafa Syed, Pavan Anumula • http://compbio.mcs.anl.gov/puma2/cgi-bin/index.cgi • Gnare – Private genome sequences • Puma2 – Public genome sequences

  8. Types of Functional Descriptors • Hypothetical protein • Conserved hypothetical protein • Conserved domain protein • Function associated protein • Class specific enzyme • Specific function predicted • Function validated

  9. Go to Puma page for homolog Checking Functions Where No Domain Hit Occurs type IV secretion outer membrane protein, PilW?

  10. Domain identified Align proteins Shewanellaoneidensis MR-1 MKNCQKG

  11. Clues in Interpro Domain Descriptor This is a family of hypothetical proteins. A number of the sequence records state they are transmembrane proteins or putative permeases. It is not clear what source suggested that these proteins might be permeases and this information should be treated with caution. autoinducer-2 transport protein, TqsA 2.A.86 The Autoinducer-2 Exporter (AI-2E) Family The AI-2E family (UPF0118) is a large family of prokaryotic proteins derived from a variety of bacteria and archaea. Those examined are about 350 residues in length, and the couple that have been examined exhibit 7 putative transmembrane α-helical spanners (TMSs). E. coli, B. subtilis and several other prokaryotes have multiple paralogues encoded within their genomes. Herzberg et al. (2006) have presented strong evidence for a role of a AI-2E family homologue, YdgG (renamed TqsA), as an exporter of the E. coli autoinducer-2 (AI-2) (Camilli and Bassler, 2006; Chen et al., 2002). AI-2 is a proposed signalling molecule for interspecies communication in bacteria. It is a furanosyl borate diester (Chen et al., 2002).

  12. No functional clues Using Genome Context to Predict Function

  13. Missing enzyme Clusters with N-acetyl glucosame catabolic enzymes Hypothesis experimentally validated!

  14. General enzyme function Precomputed text mining

  15. Relevant abstracts mentioning your query species (Shewanella oneidensis) sulfite dehydrogenase catalytic molybdopterin subunit, SorA

  16. Domain hit does not match current annotation propogated in automated annotations!!! Mistake in Interpro Database found!

  17. More Automation in Evidence Collecting Needed

  18. cytoplasm extracellular outer membrane periplasm peptidoglycan inner membrane cytoplasm Protein Location Linked to Function

  19. ++ K/RRXFXK AXA X AXA +++ G P X ++ LXG C Multiple Routes of Secretion LepB LepB LspA G F E PilD GG C39

  20. Bioinformatics Tools for Localization Prediction Incorrect start sites have strong impact on predictions! Different tools have unique specialties No one tool provides good predictions for all proteins LepB IM TM Psort LipoP Predsi Phobius SignalP TatP Sosui TmHMM Phobius Psort HMMTOP LspA b barrel SubLoc Cello Psort Secretome LipoP Lipo Psort ProfTMB Bomp BBTM

  21. Example: c type cytochromes • Contain CXXCH motif for binding heme …so do some other proteins that are not c type cytochromes  • All are secreted across the inner membrane and then assembled • 60 proteins in MR-1 have CXXCH • Only 43 have a leader peptide and are predicted to be c type cytochromes

  22. Future Needs in Annotation Automation • Current methods of automated annotation will lead to propagation of annotation errors and burying of useful evidence • But manual annotation cannot keep up with rate at which sequences are produced • Additional automations are needed! • Protein localization • Specialty database mining (TCDB, merops, etc) • Experimental data mining – appropriate databases don’t exist

More Related