lecture 6 gene ontology and gene annotation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 6: Gene ontology and Gene Annotation PowerPoint Presentation
Download Presentation
Lecture 6: Gene ontology and Gene Annotation

Loading in 2 Seconds...

play fullscreen
1 / 44

Lecture 6: Gene ontology and Gene Annotation - PowerPoint PPT Presentation


  • 263 Views
  • Uploaded on

Lecture 6: Gene ontology and Gene Annotation. June 19 , 2014. What is gene annotation. Process of assigning descriptions to a known gene that represent: Assigned gene name Molecular function, process and cellular location

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Lecture 6: Gene ontology and Gene Annotation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Lecture 6:Gene ontology and Gene Annotation June 19, 2014

    2. What is gene annotation • Process of assigning descriptions to a known gene that represent: • Assigned gene name • Molecular function, process and cellular location • Protein features: domains, functional elements such as nuclear localization signals

    3. What is the Gene Ontology? • Set of standard biological phrases (terms) which are applied to genes/proteins: • protein kinase • apoptosis • Membrane • Standardizing representation of gene and gene product attributes across species and databases

    4. Who annotates the genes? • Curators at the major databases • NCBI, EBI, MGI, model organism databases • Uniprot • Protein domain databases (PFAM, SMART, Interpro) • Older sources (SwissProt, PIR) • Gene ontology groups

    5. Why use gene ontology? • Allows biologists to make queries across large numbers of genes without researching each one individually • Can find all the PI3 kinases in a given genome or find all proteins involved in oxidative stress response without prior knowledge of every gene

    6. From the Ex 1 gene list • Vha-6 • C. elegans gene called vacuolar H ATPase • What is its role in the cell? • Gene ontology biological process: • body morphogenesis & determination of adult lifespan; lipid storage • GO molecular function: • H ion transmembrane transporter • GO cellular component • Apical plasma membrane, vacuolar ATPase complex

    7. Asparagine utilization Lysine biosynthesis Cell wall catabolism Oxidative stress response Glucose repression Aging Ribose metabolism Protein folding Ubiquinone biosynthesis A long list of genes...how do you make sense of them? By using gene ontology Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868

    8. GO structure Nucleic acid binding is a type of binding. • GO isn’t just a flat list of biological terms • terms are related within a hierarchy is_a is_a DNA binding is a type of nucleic acid binding.

    9. gene A GO structure A single gene associated with with a particular term is automatically annotated to all of the parent terms

    10. GO structure • This means genes can be grouped according to user-defined levels • Allows broad overview of gene set or genome

    11. How does GO work? • What does the gene product do? • Where and does it act? • Why does it perform these activities? What information might we want to capture about a gene product?

    12. GO structure • GO terms divided into three parts: • cellular component • molecular function • biological process

    13. Cellular Component • where a gene product acts Mitochondria

    14. Cellular Component Cellular components of a virus different than a cell

    15. Cellular Component Enzyme complexes in the component ontology refer to places, not activities.

    16. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity

    17. Molecular Function insulin binding insulin receptor activity

    18. Molecular Function • A gene product may have several functions • Sets of functions make up a biological process.

    19. cell division Biological Process a commonly recognized series of events

    20. Biological Process transcription

    21. Biological Process regulation of gluconeogenesis

    22. Biological Process limb development

    23. Biological Process courtship behavior

    24. Ontology Structure • Terms are linked by two relationships • is-a  • part-of 

    25. cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of Ontology Structure

    26. term: transcription initiation id:GO:0006352 definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter. • a name • an ID number • a definition GO terms Each concept has:

    27. GO terms • Where do GO terms come from? • GO terms are added by editors at EBI and annotating databases • new terms are usually only added when they are asked for by annotators • GO editors work with experts to make major ontology developments • metabolism • pathogenesis • cell cycle

    28. Species coverage • All major eukaryotic model organism species • Human via gene ontology annotation (GOA) group at UniProt • Several bacterial and parasite species through TIGR and GeneDB at Sanger ~80 species in the Gene Ontology database

    29. Anatomy of a GO annotation • Three key parts: • gene name/id • GO term(s) • evidence for association

    30. Example annotation Human BRCA1 protein – molecular function GO terms

    31. Types of evidence codes Experimental codes Other evidence codes Computational codes

    32. Manual annotation Molecular function In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Cellular component Biological process

    33. Electronic Annotation • Annotation derived without human validation • mappings file e.g. interpro2go, ec2go. • Blast search ‘hits’ • Lower ‘quality’ than manual codes • Used in non-model organisms

    34. GO & microarray analysis • Many tools exist that use GO to find common biological functions from a list of genes • GoMiner, GOstat, Onto-express, FatiGO and GSEA to name a few • We’ll use the DAVID Bioinformatics Resource

    35. GO tools • input a gene list • shows which GO categories have most genes associated with them or are “enriched” • provides a statistical measure to determine whether enrichment is significant

    36. Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis … Gene 2 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 3 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 4 Nervous system Pregnancy Oncogenesis Mitosis … Gene 100 Positive ctrl. of cell prolif Mitosis Oncogenesis Glucose transport … Traditional analysis

    37. GO:0006915 : apoptosis Using GO annotations • But by using GO annotations, this work has already been done for you!

    38. Grouping by process Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 … Glucose transport Gene 7 Gene 3 Gene 6 … Apoptosis Gene 1 Gene 53 Positive control of cell proliferation Gene 7 Gene 3 Gene 12 … Growth Gene 5 Gene 2 Gene 6 …

    39. GO for microarray analysis • Annotations give ‘function’ label to genes • Ask meaningful questions of microarray data: • Do the genes involved in the same process have the same or different expression patterns?

    40. mitosis – 80/100 apoptosis – 40/100 Cell proliferation – 30/100 glucose transport – 20/100 microarray 1000 genes 100 genes differentially regulated experiment Using GO in practice • statistical measure • how likely your differentially regulated genes fall into that category by chance

    41. Using GO in practice • However, when you look at the distribution of all genes on the microarray:

    42. Other sources of annotation • Uniprot (Swiss-Prot) keywords • Protein domain databases • PFAM, Panther, PDB, PROSITE, ect • GeneDB summaries from NCBI • Protein-protein interactions databases • Pathway databases • KEGG, BioCarta, BBID, Reactome DAVID incorporates annotation from all of these and clusters the redundant terms

    43. Limitations of GO analysis ~40% of the C. neoformans predicted proteins are similar only to other C. neoformans and have no identifiable protein domain Difficult to do enrichment analysis on only 60% of the coded proteins

    44. Today in computer lab Tutorial on using DAVID for GO enrichment analysis Analyze the gene lists from Exercise 1 and 2 Create a sub-list that you will use in Exercise 7