Cacao biocurator training
This presentation is the property of its rightful owner.
Sponsored Links
1 / 64

CACAO Biocurator Training PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

CACAO Biocurator Training. CACAO Fall 2011. CACAO. Syllabus What is CACAO & why is it important? Training Examples. Mutualistic Relationship. We want you to get experience with: CRITICALLY reading scientific papers Bioinformatics resources Collaborating with other biocurators

Download Presentation

CACAO Biocurator Training

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cacao biocurator training

CACAO Biocurator Training

CACAO Fall 2011


Cacao

CACAO

  • Syllabus

  • What is CACAO & why is it important?

  • Training

  • Examples


Mutualistic relationship

Mutualistic Relationship

  • We want you to get experience with:

    • CRITICALLY reading scientific papers

    • Bioinformatics resources

    • Collaborating with other biocurators

    • Synthesizing functional annotations

  • We want to get high quality functional annotations to contribute back to the GO Consortium and other biological databases


What is an annotation

What is an annotation?

Hint: try looking for a definition on Wikipedia.


What is a functional annotation

What is a functional annotation?

  • Process of attaching information from the scientific literature to proteins


Growing need for functional annotations

Growing need for functional annotations

  • Advances in DNA sequencing mean lots of new genomes & metagenomes


Classic model

Classic MODel

Literature

Database

Curators

(rate limiting)

Datasets


Classic model is expensive

Classic MODel is Expensive

YIKES!


Growing need for high quality functional annotations

Growing need for high quality functional annotations

  • High quality annotations allow us to infer the function of genes

  • Which allows us to understand the capabilities of genomes and understand the patterns of gene expression


Two problems meet

Two problems meet

How can we

incorporate more

critical analysis into

undergraduate

education?

How can we get

more curators

with finite budgets?


What does a functional annotation have to do with this course

What does a functional annotation have to do with this course?

  • Process of attaching information from the scientific literature to proteins

  • CACAO will teach you to become a biocurator

    • you will be adding functional annotations to the biological database GONUTS (http://gowiki.tamu.edu)


Cacao1

CACAO

Community

Assessment

- How well can

Community

- you (with our coaching)

Annotation with

- assign gene functions

Ontologies

- using GO?


Can students become biocurators yes

Can students become biocurators? YES!

1340 GO annotations in 2 & 1/2 semesters!


Functional annotation with gene ontology

Functional annotation with Gene Ontology

  • Controlled vocabulary with

    • Term identifiers

      • GO:0000075

    • Name

      • cell cycle checkpoint

    • Definitions

      • "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194]

    • Relationships

      • is_a GO:0000074 ! regulation of progression through cell cycle

  • Terms arranged in a Directed Acyclic Graph (DAG)


Why use ontologies

Why use Ontologies?

  • Standardization

  • facilitate comparison across systems

  • facilitate computer based reasoning systems

    • Good for data mining!

  • leading functional annotation ontology = Gene Ontology (GO)


What is go who is the go consortium goc

What is GO? Who is the GO Consortium (GOC)?

  • GO = ~30,000 terms for gene product attributes

    • Molecular Function (enzyme activity)

    • Biological Process (pathways)

    • Cellular Component (parts of the cell)

  • GO Consortium - set of biological databases that are involved in developing GO and contributing GO annotations


  • Cellular component

    Cellular Component

    • where a gene product acts


    Molecular function

    Molecular Function

    • activities or “jobs” of a gene product

    glucose-6-phosphate isomerase activity

    figure from GO consortium presentations


    Biological process

    Biological Process

    • a commonly recognized series of events

    cell division

    Figure from Nature Reviews Microbiology 6, 28-40 (January 2008)


    Cacao biocurator training

    Where can we find GO terms?

    GONUTS

    http://gowiki.tamu.edu


    Search for go terms on gonuts

    Search for GO terms on GONUTS

    http://gowiki.tamu.edu


    Which subontology mf bp or cc would the following terms fit in

    Which subontology (MF, BP or CC) would the following terms fit in?

    GO:0003909 DNA ligase activity

    GO:0071705 Nitrogen compound transport

    GO:0007124 Pseudohyphal growth

    GO:0015123 Acetate transmembrane transporter activity

    GO:0071514 Genetic imprinting

    GO:0005773 Vacuole

    GO:0000312 Plastid small ribosomal subunit


    Questions

    What do we know so far?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.


    Cacao biocurator training

    Where are we adding GO annotations?

    GONUTS

    http://gowiki.tamu.edu


    Why are we using gonuts

    Why are we using GONUTS?

    • Students can add functional annotations to proteins.

    • It has all the GO terms in it, too.

    • Some of the GO terms have usage notes.

    • It works a lot like Wikipedia, so it’s familiar.

    • It has the ability to keep track of each student’s and team’s annotations.

    • We run it.

    http://gowiki.tamu.edu


    Required parts of a go annotation

    REQUIRED parts of a GO annotation

    http://gowiki.tamu.edu/wiki/index.php/ECOLI:LPOB

    GO

    ** I will cover this again!!


    Parts of a go annotation cont

    Parts of a GO annotation (cont)

    Evidence code


    Parts of a go annotation cont1

    Parts of a GO annotation (cont)

    Reference

    Notes (about evidence)


    Questions1

    What do we know so far?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.

    3. You will be adding your GO annotations to GONUTS.

    4. There are 4 required parts to a GO annotation.

    5. You have to base your annotation on an experiment published in a scientific paper.


    Next week

    Next week

    • Review of GO & GO annotations

    • More biocurator training

      • lots of examples

      • lots of practice

    BICH 485 & 689 students - please stick around to talk about these courses!


    Plan for training

    Plan for training

    • Synthesizing GO annotations

    • Refinements

    • Judging & Assessment

    • Individual & Team tracking


    Part 1 synthesizing go annotations

    Part 1: Synthesizing GO annotations


    What can you annotate

    What can you annotate?

    • Proteins.

      • Any protein with a record in UniProt (Universal Protein Resource - http://uniprot.org)

    • How can you find proteins to annotate?

      • Think of ways to identify a protein or paper to annotate


    Choosing a protein to annotate

    Choosing a protein to annotate

    1. randomly

    2. topics of interest (ie efflux pump proteins, biofilms, marine biology)

    3. papers you have come across while doing other stuff

    4. methods you know or want to learn

    5. phenotypes and mutants you are interested in

    6. by author

    7. by pathway or regulon

    8. suggested by another

    - high ratio of IEA:manual annotations in GONUTS

    - mentioned in another class

    9. current paper mentions another gene product

    10. review papers (ie Annual Reviews are excellent sources)

    11. Uniprot, GONUTS, WikiPathways, PubMed searches

    12. protein annotated by other teams

    13. ask a coach


    Search for go terms on gonuts1

    Search for GO terms on GONUTS

    http://gowiki.tamu.edu


    Practice

    Practice

    http://gowiki.tamu.edu

    1. What is the GO term for GO:0004713?

    2. What is the GO identifier for mitosis?

    3. How many results (ballpark) do you get when you search for cell division using the Go, Search or G buttons?

    4. How many child terms are there for plasma membrane? How many grandchildren?

    5. What term is the parent of GO:006825?


    Finding a scientific paper on a certain protein

    Finding a scientific paper on a certain protein

    • Has to be a scientific paper with experimental data in it.

      • Anything else is a valid reason to challenge!

    • PubMed, PubMed Central, GoogleScholar…

    • No review articles

    • no books, textbooks, wikipedia articles, class notes…

    • You will need the PMID number


    Practice searching pubmed

    Practice - searching PubMed

    http://pubmed.org

    • How many papers do you get when you search for “coli”?

    • How many of those papers are reviews?

    • What is the title of the oldest paper when you search for “coli AND RNA polymerase”?

    • How many results are there when you search for “GTPase activity and Gene Ontology”?

    • What is the PMID of the paper when you search for “Hu JC AND coli AND lysR AND 2010”?


    Why do we annotate on gonuts

    Why do we annotate on GONUTS?

    • UniProt (Universal Protein Resource) will not let us annotate protein records on their site.

      • They are a professionally-curated & closed database.

  • GONUTS will.

    • GONUTS pulls the info from the UniProt record when it makes a page for you to edit.


  • Making a protein page on gonuts requires a uniprot accession

    Making a protein page on GONUTS requires a UniProt accession

    • UniProt - http://www.uniprot.org

    • UniProt is not community edited, so we can’t add annotations directly to their database


    Practice searching uniprot

    Practice - Searching UniProt

    Find the UniProt accessions for:

    • Mouse Lsr protein

    • Diptheria toxin from Corynebacterium

    • mutS from E. coli K-12

    http://uniprot.org


    How do you make a new gene page in gonuts

    How do you make a new gene page in GONUTS?

    2

    1

    • Use a UniProt accession to make a page on GONUTS that you can add your own annotations to.

    • GoPageMaker will:

      - Check if the page exists in GONUTS & take you there if it does.

      - Make a page & pull all of the annotations from UniProt into a table that you can edit.


    Practice1

    Practice

    http://gowiki.tamu.edu

    • How many annotations are on the page for the p53 protein from humans?

    • How many different evidence codes are there on the page for the Bub1a protein from mice?

    • Give one of the paper identifiers for an annotation for the LpxK protein from E. coli.


    Questions2

    What do we know so far?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.

    3. You will be adding your GO annotations to GONUTS.

    4. There are 4 required parts to a GO annotation.

    5. You have to base your annotation on an experiment published in a scientific paper.

    You can annotate any protein with a record in UniProt.

    You have to make a page in GONUTS for your protein using the UniProt accession.


    What are evidence codes

    What are evidence codes?

    • Describe the type of work or analysis done by the authors

    • 5 general categories of evidence codes:

      • Experimental

      • Computational

      • Author Statement

      • Curator Assigned

      • Automatically assigned by GO


    What are the evidence codes

    What are the evidence codes?

    • Describe the type of work or analysis done by the authors

    • 5 general categories of evidence codes:

      • Experimental

      • Computational

      • Author Statement

      • Curator Assigned

      • Automatically assigned by GO

    • CACAO biocurators may only use certain experimental and computational evidence codes


    Experimental evidence codes

    Experimental Evidence Codes

    • IDA: Inferred from Direct Assay

    • IMP: Inferred from Mutant Phenotype

    • IGI: Inferred from Genetic Interaction

    • IEP: Inferred from Expression Pattern

    • IPI: Inferred from Physical Interaction

    • EXP: Inferred from Experiment


    Experimental evidence codes1

    Experimental Evidence Codes

    • IDA: Inferred from Direct Assay

    • IMP: Inferred from Mutant Phenotype

    • IGI: Inferred from Genetic Interaction

    • IEP: Inferred from Expression Pattern

    • IPI: Inferred from Physical Interaction

    • EXP: Inferred from Experiment

    http://geneontology.org/GO.evidence.shtml


    Computational evidence codes

    Computational Evidence Codes

    • ISS: Inferred from Sequence or Structural Similarity

    • ISO: Inferred from Sequence Orthology

    • ISA: Inferred from Sequence Alignment

    • ISM: Inferred from Sequence Model

    • IGC: Inferred from Genomic Context

    • IBA: Inferred from Biological Aspect of Ancestor

    • IBD: Inferred from Biological Aspect of Descendant

    • IKR: Inferred from Key Residues

    • IRD: Inferred from Rapid Divergence

    • RCA: Inferred from Reviewed Computational Analysis

    http://geneontology.org/GO.evidence.shtml


    Computational evidence codes1

    Computational Evidence Codes

    • ISS: Inferred from Sequence or Structural Similarity

    • ISO: Inferred from Sequence Orthology

    • ISA: Inferred from Sequence Alignment

    • ISM: Inferred from Sequence Model

    • IGC: Inferred from Genomic Context

    • IBA: Inferred from Biological Aspect of Ancestor

    • IBD: Inferred from Biological Aspect of Descendant

    • IKR: Inferred from Key Residues

    • IRD: Inferred from Rapid Divergence

    • RCA: Inferred from Reviewed Computational Analysis

    http://geneontology.org/GO.evidence.shtml


    Summary of evidence codes for cacao

    Summary of Evidence Codes for CACAO

    • IDA: Inferred from Direct Assay

    • IMP: Inferred from Mutant Phenotype

    • IGI: Inferred from Genetic Interaction

    • IEP: Inferred from Expression Pattern

    • ISO: Inferred from Sequence Orthology

    • ISA: Inferred from Sequence Alignment

    • ISM: Inferred from Sequence Model

    • IGC: Inferred from Genomic Context

    • If it’s not one of these 8, your annotation is incorrect!!!


    Required parts for every annotation

    Required parts (for every annotation)

    GO:0004713

    PMID:1111

    IDA: Inferred from

    direct assay

    Figure 2a


    What you might also have to fill in

    What you might also have to fill in

    http://geneontology.org/GO.evidence.shtml


    Questions3

    What do we know so far?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.

    3. You will be adding your GO annotations to GONUTS.

    4. There are 4 required parts to a GO annotation.

    5. You have to base your annotation on an experiment published in a scientific paper.

    You can annotate any protein with a record in UniProt.

    You have to make a page in GONUTS for your protein using the UniProt accession.


    Practice identify the problem annotation s why

    Practice - Identify the problem annotation(s) & why

    1. GO:0003674 PMID:20372022IDA: Inferred from Direct Assay Table 2.

    2. GO:0016985 PMID:20372022IMP: Inferred from Mutant Phenotype Table 2.

    3. GO:0016985 PMID:20372022IDA: Inferred from Direct Assay

    4. GO:0016985 PMID:20372022IDA: Inferred from Direct Assay Table 2.

    5. GO:0003674 PMID:20372022IDA: Inferred from Direct Assay Table 2.

    6. GO:0016985 PMID:20372002 IGI: Inferred from Genetic Interaction Table 2.

    7. GO:0016985 20372022 IDA: Inferred from Direct Assay Table 2.

    8. GO:0016985 PMID:20372002 EXP: Inferred from Experiment Table 2.

    9. What is the UniProt accession of the protein described/annotated?

    GO ID

    Reference

    Evidence Code

    Notes


    How is cacao scored

    How is CACAO scored?

    • Points for a complete annotation

      • GO term (right level of specificity)

      • Reference (paper)

      • Evidence code

      • Identify where in the paper the evidence is

  • Refinements used to steal points for incorrect &/or incomplete annotations

    • Identify a problem

    • Suggest correct alternative

  • Refinements can be entered by any team (including the original team)


  • How can you get the annotations required by rubric 2

    How can you get the annotations required by Rubric #2?

    • Synthesize complete & correct annotations.

    • Correctly refine (challenge & correct) someone else’s annotation.

    • If your annotation gets challenged, offer the best correction.


    Summary

    Summary

    • You will be searching literature for experimental evidence for a protein’s function (MF), processes (BP) and location (CC)


    Where do annotations show up

    Where do annotations show up?


    Refinements challenges

    Refinements & Challenges


    What can you challenge

    What can you challenge?


    Scoreboard

    Scoreboard


    Schedule

    Schedule


    Spring 2011 results by organism

    Spring 2011 - Results by organism


  • Login