1 / 72

Understanding proteins: resources for identification and annotation

Understanding proteins: resources for identification and annotation. The Gene Ontology: Annotating protein function, role and localization. Contact: Jane Lomax Coordinator, GO Editorial Office EBI-EMBL jane@ebi.ac.uk. What is an ontology?. What is an ontology?. Collectibles & art Stamps

jack
Download Presentation

Understanding proteins: resources for identification and annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding proteins: resources for identification and annotation

  2. The Gene Ontology: Annotating protein function, role and localization Contact: Jane Lomax Coordinator, GO Editorial Office EBI-EMBL jane@ebi.ac.uk

  3. What is an ontology?

  4. What is an ontology? • Collectibles & art • Stamps • UK (Great Britain)Victoria • 1884 GREAT BRITAIN 10S SCOTT (11,999.99$) A definition... “A controlled representation of ideas, concepts or events in a given domain and the relationships between them.”

  5. Why do we need ontologies? Help with data retrieval allow grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query ‘brain’ without ontology 20 Query ‘brain’ with ontology 45 Make data (re-)usable through standards • Common structure and terminology (controlled vocabulary) • Avoid redundancies (single data source) • Allow common tools, techniques, training, validation... Adapted from Barry Smith: http://ontology.buffalo.edu/smith/BioOntology_Course.html

  6. Gene ontology • http://geneontology.org/ What is the gene ontology? Organized, controlled vocabulary of terms that describe gene products characteristics. • Represents gene product properties, not gene products themselves • Three branches (domains): • Cellular component • Molecular function • Biological process • Species-independent (with taxonomic restrictions) • Represents physiologicalprocesses • Goes up to the level of the cell

  7. How does GO work? The Gene Ontology is like a dictionary term: transcription initiation id: GO:0006352 definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter.

  8. is_a part_of GO tree and annotations Clark et al., 2005

  9. An annotation example… • GO terms for Caspase 9

  10. Which processes are up- or down-regulated? time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes control attacked Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

  11. QuickGO: browsing GO Term definition • http://www.ebi.ac.uk/QuickGO/

  12. QuickGO: browsing GO Term relationships (ancestors)

  13. QuickGO: browsing GO Term relationships (children)

  14. QuickGO: browsing GO Proteins annotated to term

  15. Annotation and ontology files www.geneontology.org/GO.downloads.shtml • Ontology files: • Hold ontology terms and structure • Species-independent • You can get GO-slims • Annotation files: • Hold list of terms and the proteins annotated with them • You can get species-specific files or the whole annotation.

  16. More about GO: EBI train online www.ebi.ac.uk/training/online/course/go-quick-tour www.ebi.ac.uk/training/online/course/uniprot-goa-quick-tour

  17. Acknowledgements & questions Jane Lomax Coordinator, GO Editorial OfficeEBI-EMBL jane@ebi.ac.uk

  18. UniProt: A repository of annotated protein sequences Contact: Duncan Legge UniProt Content TeamEBI-EMBL help@uniprot.org dlegge@ebi.ac.uk

  19. Background of UniProt Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database

  20. We Aim To Provide… • A high quality protein sequence database • A non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs. Sequence archiving essential. • Easy protein identification • Stable identifiers and consistent nomenclature / controlled vocabularies • Thorough protein annotation • Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source

  21. The Two Sides of UniProtKB UniProtKB/TrEMBL UniProtKB/Swiss-Prot 1 entry per nucleotide submission 1 entry per protein Redundant, automatically annotated - unreviewed Non-redundant, high-quality manual annotation - reviewed

  22. UniProtKB/TrEMBL Computationally annotated UniProtKB/Swiss-Prot Manually annotated

  23. Data sources of UniProtKB UniProt/TrEMBL Ensembl ENA (EMBL) DNA database PDB Sub/ Peptide Data FlyBase WormBase VEGA (Sanger) Patent Data mRNA Data

  24. Curation of a UniProt/SwissProtentry Nomenclature UniProt/TrEMBL Sequence Literature Annotations Sequence variants Ontologies References UniProt/SwissProt Sequence features

  25. UniProt Website www.uniprot.org

  26. UniProt layout

  27. Annotation comments FUNCTION SUBCELLULAR LOCATION ALTERNATIVE PRODUCTS TISSUE SPECIFICITY DEVELOPMENTAL STAGE INDUCTION SIMILARITY CATALYTIC ACTIVITY COFACTOR ENZYME REGULATION BIOPHYSICOCHEMICAL- PROPERTIES PATHWAY SUBUNIT INTERACTION PTM RNA EDITING MASS SPECTROMETRY DOMAIN POLYMORPHISM DISRUPTION PHENOTYPE ALLERGEN DISEASE TOXIC DOSE BIOTECHNOLOGY PHARMACEUTICAL MISCELLANEOUS CAUTION SEQUENCE CAUTION WEB RESOURCE

  28. Evidence tags to show source Controlled vocabularies used whenever possible

  29. Master headline

  30. Proteomes in UniProt Complete proteomes Reference proteomes Complete sets of proteins thought to be expressed by organisms whose genomes have been completely sequenced. Some complete proteomes have been selected as reference proteome sets. These cover the proteomes of well-studied model organisms and other proteomes of interest for biomedical research.

  31. Obtaining Proteomes

  32. Help / Feedback • Stuck? Just ask – active help and support team • Feedback – if you find something incorrect, outdated, missing etc please tell us. • help@uniprot.org

  33. Find out more: EBI online courses www.ebi.ac.uk/training/online/course/uniprot-quick-tour/

  34. Acknowledgements & questions Duncan Legge UniProt Content TeamEBI-EMBL dlegge@ebi.ac.uk

  35. InterPro: An integrated protein sequence analysis resource Contact: AmaiaSangrador InterPro curation TeamEBI-EMBL interhelp@ebi.ac.uk amaia@ebi.ac.uk

  36. What is InterPro? • InterPro is a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites • It combines predictive models (known as signatures) from different databases to provide functional analysis of protein sequences by classifying them into families and predicting domains and important sites

  37. The aim of InterPro InterPro

  38. Protein annotation: a predictive approach • Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment • We can use these models to infer relationships with the characterised sequences from which the alignment was constructed • This is the approach taken by protein signaturedatabases

  39. Three (4) different protein signature approaches Single motif methods Patterns Full alignment methods Profiles & Hidden Markov models (HMMs) Multiple motif methods Fingerprints

  40. InterPro Consortium HAMAP Profiles Protein features  (sites) Functional annotation of families/domains Structural domains Patterns Finger prints Hidden Markov Models

  41. Signatures are provided by member databases They are scanned against the UniProt database to see which sequences they match Curators manually inspect the matches before integrating the signatures into InterPro InterPro signature integration process • Signatures representing the same entity are integrated together • Relationships between entries are traced, where possible • Curators add literature referenced abstracts, cross-refs to other databases, and GO terms

  42. http://www.ebi.ac.uk/interpro/

  43. Using InterPro Let’s find some information about T-cell surface antigen CD4 in InterPro Search using the key word: CD4

  44. Results from the “CD4” key word search

  45. Family-centered view Type Name Identifier Contributing signatures Description References Go terms

  46. Using InterPro Search using human CD4 protein sequence

  47. Protein-centered view Identifier Type Name Domains Family

  48. Domain-centered view Type Name Identifier Contributing signatures Description References

  49. Using InterPro with unknown sequences: InterProScan Search with unknown protein sequence InterProScan is the software package that allows sequences to be scanned against InterPro's signatures

More Related