Bich 489 500 cacao
Download
1 / 75

BICH 489-500 - CACAO - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

BICH 489-500 - CACAO. Biocurator Training Session. Plan for tonight. Pre-assessment survey Syllabus Review Annotation synthesis Practice!. Mutualistic Relationship. We want you to get experience with: CRITICALLY reading scientific papers Bioinformatics resources

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BICH 489-500 - CACAO' - tamas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bich 489 500 cacao

BICH 489-500 - CACAO

Biocurator Training Session


Plan for tonight
Plan for tonight

  • Pre-assessment survey

  • Syllabus

  • Review

  • Annotation synthesis

  • Practice!


Mutualistic relationship
Mutualistic Relationship

  • We want you to get experience with:

    • CRITICALLY reading scientific papers

    • Bioinformatics resources

    • Collaborating with other biocurators

    • Synthesizing functional annotations

  • We want to get high quality functional annotations to contribute back to the GO Consortium and other biological databases


Growing need for functional annotations
Growing need for functional annotations

  • Advances in DNA sequencing mean lots of new genomes & metagenomes


Growing need for high quality functional annotations
Growing need for high quality functional annotations

  • High quality annotations allow us to infer the function of genes

  • Which allows us to understand the capabilities of genomes and understand the patterns of gene expression


Classic model
Classic MODel

Literature

Database

Curators

(rate limiting)

Datasets


What does a functional annotation have to do with this course
What does a functional annotation have to do with this course?

  • Process of attaching information from the scientific literature to proteins

  • CACAO will teach you to become a biocurator

    • you will be adding functional annotations to the biological database GONUTS (http://gowiki.tamu.edu)


How is cacao scored
How is CACAO scored? course?

  • Points for a complete annotation

    • GO term (right level of specificity)

    • Reference (paper)

    • Evidence code

    • Identify where in the paper the evidence is

  • Refinements used to steal points for incorrect &/or incomplete annotations

    • Identify a problem

    • Suggest correct alternative

  • Refinements can be entered by any team (including the original team)


  • How can you get the annotations required by rubric 2
    How can you get the annotations required by Rubric #2? course?

    • Synthesize complete & correct annotations.

    • Correctly refine (challenge & correct) someone else’s annotation.

    • If your annotation gets challenged, offer the best correction.


    Functional annotation with gene ontology
    Functional annotation with Gene Ontology course?

    • Controlled vocabulary with

      • Term identifiers

        • GO:0000075

      • Name

        • cell cycle checkpoint

      • Definitions

        • "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194]

      • Relationships

        • is_a GO:0000074 ! regulation of progression through cell cycle

    • Terms arranged in a Directed Acyclic Graph (DAG)


    Why use ontologies
    Why use Ontologies? course?

    • Standardization

    • facilitate comparison across systems

    • facilitate computer based reasoning systems

      • Good for data mining!

    • leading functional annotation ontology = Gene Ontology (GO)


    What is go who is the go consortium goc
    What is GO? Who is the GO Consortium (GOC)? course?

    • GO = ~30,000 terms for gene product attributes

      • Molecular Function (enzyme activity)

      • Biological Process (pathways)

      • Cellular Component (parts of the cell)

  • GO Consortium - set of biological databases that are involved in developing GO and contributing GO annotations


  • Cellular component
    Cellular Component course?

    • where a gene product acts


    Molecular function
    Molecular Function course?

    • activities or “jobs” of a gene product

    glucose-6-phosphate isomerase activity

    figure from GO consortium presentations


    Biological process
    Biological Process course?

    • a commonly recognized series of events

    cell division

    Figure from Nature Reviews Microbiology 6, 28-40 (January 2008)


    Which subontology mf bp or cc would the following terms fit in
    Which subontology (MF, BP or CC) would the following terms fit in?

    GO:0001070 RNA binding transcription factor activity

    GO:0003677 DNA binding

    GO:0009254 Peptidoglycan turnover

    GO:0003918 DNA topoisomerase (ATP-hydrolyzing) activity

    GO:0006835 dicarboxylic acid transport

    GO:0009360 DNA polymerase III complex

    GO:0005694 Chromosome

    GO:0008270 Zinc ion binding

    GO:0000901 translation repressor activity, non-nucleic acid binding



    Where can we find GO terms? fit in?

    GONUTS

    http://gowiki.tamu.edu


    Search for go terms on gonuts
    Search for GO terms on GONUTS fit in?

    http://gowiki.tamu.edu


    • CHICK - AgBase ( fit in?Gallus gallus)

    • dictyBase - dictyBase (Dictyostelium discoideum - slime mold)

    • FB - FlyBase (Drosophila melanogaster)

    • HUMAN - Reactome, BHF-UCL

    • MGI - Mouse genome informatics (Mus musculus - house mouse)

    • SGD - Saccharomyces genome database (Saccharomyces cerevisiase - yeast)

    • TAIR - The Arabidopsis Informatics Resource (Arabidopsis thaliana)

    • WB - WormBase (Caenorhabditis elegans)

    • ZFIN - Zebrafish model organism database (Danio rerio)



    Practice
    Practice term?

    http://gowiki.tamu.edu

    1. What is the GO term for GO:0004713?

    2. What is the GO identifier for mitosis?

    3. How many results (ballpark) do you get when you search for cell division using the Go, Search or G buttons?

    4. How many child terms are there for plasma membrane? How many grandchildren?

    5. What term is the parent of GO:0006825?



    4 required parts of every go annotation
    4 REQUIRED parts of term?EVERY GO annotation

    http://gowiki.tamu.edu/wiki/index.php/SGD:ADA2

    GO

    ** I will cover this again!!



    4 required parts of a go annotation cont1
    4 Required Parts of a GO annotation (cont) term?

    Reference

    Notes (about evidence)


    2 other parts that may be required
    2 other parts that may be required… term?

    Qualifier

    With/from


    Where are we adding GO annotations? term?

    GONUTS

    http://gowiki.tamu.edu


    term?


    term?


    term?



    What you must fill in for every annotation
    What you term?must fill in (for every annotation)

    GO:0004713

    PMID:1111

    IDA: Inferred from

    direct assay

    Figure 2a


    What you might also have to fill in
    What you might also have to fill in term?

    Not sure? Check the competition guidelines. Ask a coach (Jim, Debby, Adrienne or usually me)!


    Questions

    What do we know so far? term?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.

    3. You will be adding your GO annotations to GONUTS.

    4. There are 4 required parts & 2 parts that may be required in a GO annotation.

    5. You have to base your annotation on an experiment published in a scientific paper.



    What can you annotate
    What can you annotate? term?

    • Proteins.

      • Any protein with a record in UniProt (Universal Protein Resource - http://uniprot.org)

    • How can you find proteins to annotate?

      • Think of ways to identify a protein or paper to annotate


    • Think term?

    • Consult your neighbor(s)


    Choosing a protein to annotate
    Choosing a protein to annotate term?

    1. randomly

    2. topics of interest (ie efflux pump proteins, biofilms, marine biology)

    3. papers you have come across while doing other stuff

    4. methods you know or want to learn

    5. phenotypes and mutants you are interested in

    6. by author

    7. by pathway or regulon

    8. suggested by another

    - high ratio of IEA:manual annotations in GONUTS

    - mentioned in another class

    9. current paper mentions another gene product

    10. review papers (ie Annual Reviews are excellent sources)

    11. Uniprot, GONUTS, WikiPathways, PubMed searches

    12. protein annotated by other teams

    13. ask a coach


    Finding a scientific paper on a certain protein
    Finding a scientific paper on a certain protein term?

    • Has to be a scientific paper with experimental data in it.

      • Anything else is a valid reason to challenge!

    • PubMed, PubMed Central, GoogleScholar…

    • No review articles

    • no books, textbooks, wikipedia articles, class notes…

    • You will need the PMID number


    Practice searching pubmed
    Practice - searching PubMed term?

    http://pubmed.org

    • How many papers do you get when you search for “coli”?

    • How many of those papers are reviews?

    • What is the title of the oldest paper when you search for “coli AND RNA polymerase”?

    • How many results are there when you search for “GTPase activity and Gene Ontology”?

    • What is the PMID of the paper when you search for “Hu JC AND coli AND lysR AND 2010”?


    Why do we annotate on gonuts
    Why do we annotate on GONUTS? term?

    • UniProt (Universal Protein Resource) will not let us annotate protein records on their site.

      • They are a professionally-curated & closed database.

  • GONUTS will.

    • GONUTS pulls the info from the UniProt record when it makes a page for you to edit.


  • Making a protein page on gonuts requires a uniprot accession
    Making a protein page on GONUTS requires a UniProt accession term?

    • UniProt - http://www.uniprot.org


    Practice searching uniprot
    Practice - Searching UniProt term?

    Find the UniProt accessions for:

    • Mouse Lsr protein

    • Diptheria toxin from Corynebacterium

    • mutS from E. coli K-12

    http://uniprot.org


    How do you make a new gene page in gonuts
    How do you make a new gene page in GONUTS? term?

    2

    1

    • Use a UniProt accession to make a page on GONUTS that you can add your own annotations to.

    • GoPageMaker will:

      - Check if the page exists in GONUTS & take you there if it does.

      - Make a page & pull all of the annotations from UniProt into a table that you can edit.


    Practice1
    Practice term?

    http://gowiki.tamu.edu

    • How many annotations are on the page for the p53 protein from humans?

    • How many different evidence codes are there on the page for the Bub1a protein from mice?

    • Give one of the paper identifiers for an annotation for the LpxK protein from E. coli.


    Questions1

    What do we know so far? term?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.

    3. You will be adding your GO annotations to GONUTS.

    4. There are 4 required parts to a GO annotation.

    5. You have to base your annotation on an experiment published in a scientific paper.

    You can annotate any protein with a record in UniProt.

    You have to make a page in GONUTS for your protein using the UniProt accession.


    What are evidence codes
    What are evidence codes? term?

    • Describe the type of work or analysis done by the authors

    • 5 general categories of evidence codes:

      • Experimental

      • Computational

      • Author Statement

      • Curator Assigned

      • Automatically assigned by GO


    What are the evidence codes
    What are the evidence codes? term?

    • Describe the type of work or analysis done by the authors

    • 5 general categories of evidence codes:

      • Experimental

      • Computational

      • Author Statement

      • Curator Assigned

      • Automatically assigned by GO

    • CACAO biocurators may only use certain experimental and computational evidence codes


    Experimental evidence codes
    Experimental Evidence Codes term?

    • IDA: Inferred from Direct Assay

    • IMP: Inferred from Mutant Phenotype

    • IGI: Inferred from Genetic Interaction

    • IEP: Inferred from Expression Pattern

    • IPI: Inferred from Physical Interaction

    • EXP: Inferred from Experiment


    Experimental evidence codes1
    Experimental Evidence Codes term?

    • IDA: Inferred from Direct Assay

    • IMP: Inferred from Mutant Phenotype

    • IGI: Inferred from Genetic Interaction

    • IEP: Inferred from Expression Pattern

    • IPI: Inferred from Physical Interaction

    • EXP: Inferred from Experiment

    http://geneontology.org/GO.evidence.shtml


    Computational evidence codes
    Computational Evidence Codes term?

    • ISS: Inferred from Sequence or Structural Similarity

    • ISO: Inferred from Sequence Orthology

    • ISA: Inferred from Sequence Alignment

    • ISM: Inferred from Sequence Model

    • IGC: Inferred from Genomic Context

    • IBA: Inferred from Biological Aspect of Ancestor

    • IBD: Inferred from Biological Aspect of Descendant

    • IKR: Inferred from Key Residues

    • IRD: Inferred from Rapid Divergence

    • RCA: Inferred from Reviewed Computational Analysis

    http://geneontology.org/GO.evidence.shtml


    Computational evidence codes1
    Computational Evidence Codes term?

    • ISS: Inferred from Sequence or Structural Similarity

    • ISO: Inferred from Sequence Orthology

    • ISA: Inferred from Sequence Alignment

    • ISM: Inferred from Sequence Model

    • IGC: Inferred from Genomic Context

    • IBA: Inferred from Biological Aspect of Ancestor

    • IBD: Inferred from Biological Aspect of Descendant

    • IKR: Inferred from Key Residues

    • IRD: Inferred from Rapid Divergence

    • RCA: Inferred from Reviewed Computational Analysis

    http://geneontology.org/GO.evidence.shtml


    Summary of evidence codes for cacao
    Summary of Evidence Codes for CACAO term?

    • IDA: Inferred from Direct Assay

    • IMP: Inferred from Mutant Phenotype

    • IGI: Inferred from Genetic Interaction

    • IEP: Inferred from Expression Pattern

    • ISO: Inferred from Sequence Orthology

    • ISA: Inferred from Sequence Alignment

    • ISM: Inferred from Sequence Model

    • IGC: Inferred from Genomic Context

    • If it’s not one of these 8, your annotation is incorrect!!!


    Questions2

    What do we know so far? term?

    Questions?

    1. You will be making functional (GO) annotations using GO terms.

    2. You can search for GO terms on GONUTS.

    3. You will be adding your GO annotations to GONUTS.

    4. There are 4 required parts to a GO annotation.

    5. You have to base your annotation on an experiment published in a scientific paper.

    You can annotate any protein with a record in UniProt.

    You have to make a page in GONUTS for your protein using the UniProt accession.


    Practice identify the problem annotation s why
    Practice - Identify the problem annotation(s) & why term?

    1. GO:0003674 PMID:20372022 IDA: Inferred from Direct Assay Table 2.

    2. GO:0016985 PMID:20372022 IMP: Inferred from Mutant Phenotype Table 2.

    3. GO:0016985 PMID:20372022 IDA: Inferred from Direct Assay

    4. GO:0016985 PMID:20372022 IDA: Inferred from Direct Assay Table 2.

    5. GO:0003674 PMID:20372022 IDA: Inferred from Direct Assay Table 2.

    6. GO:0016985 PMID:20372002 IGI: Inferred from Genetic Interaction Table 2.

    7. GO:0016985 20372022 IDA: Inferred from Direct Assay Table 2.

    8. GO:0016985 PMID:20372002 EXP: Inferred from Experiment Table 2.

    9. What is the UniProt accession of the protein described/annotated?

    GO ID

    Reference

    Evidence Code

    Notes


    Part 3 practice
    Part 3: PRACTICE! term?

    Break!


    Hypothetical example 1 starting from a paper
    Hypothetical Example #1 - Starting from a paper term?

    • HYPOTHETICALLY, we have a paper (PMID:100)

      • the authors purify p53 protein from humans by expressing a clone of the gene in E. coli.

      • They test for tyrosine kinase activity & report the specific activity in Table 2.


    Hypothetical example 1 cont
    Hypothetical Example #1 (cont) term?

    • What is a suitable GO term?

    • What is the UniProt accession of this protein?

    • How do you make the page for this protein on GONUTS?

    • How do you add your GO annotation?


    Real Paper #2 - starting from a topic term?

    • Topic: phenylalanine and phenylacetate catabolism in bacteria


    Practice paper 3
    Practice Paper #3 term?

    • http://www.ncbi.nlm.nih.gov/pubmed/3335830

      • Sequence Analysis of cDNA coding for a major house dust mite allergen, Der p 1

    • CAN WE USE THIS PAPER?

    • WHAT EVIDENCE CODE?

      • Aproteindata-basesearchrevealedthattheDerp1aminoacidsequenceshowedhomologywithagroupofcysteineproteases.AsshowninthecompositealignmentoftheaminoacidsequenceDerp1andfourcysteineproteases(Fig.3),significanthomologywasobservedintheaminoandCOOH-terminalregionsoftheproteins.


    Summary
    Summary term?

    • You will be searching literature for experimental evidence for a protein’s function (MF), processes (BP) and location (CC)



    Required parts for every annotation
    Required parts (for every annotation) term?

    GO:0004713

    PMID:1111

    IDA: Inferred from

    direct assay

    Figure 2a


    What you might also have to fill in1
    What you might also have to fill in term?

    http://geneontology.org/GO.evidence.shtml


    ad