1 / 64

The Protein Ontology (PRO)

The Protein Ontology (PRO). Natalia Roberts University of Delaware roberts@dbi.udel.edu. Pathways Tools Workshop October 28, 2010. Outline. Introduction to the PRotein Ontology Consortium Framework Curation PRO Website PRO Entry Search and Browse Annotation. Outline.

landry
Download Presentation

The Protein Ontology (PRO)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Protein Ontology (PRO) Natalia Roberts University of Delaware roberts@dbi.udel.edu Pathways Tools Workshop October 28, 2010

  2. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  3. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  4. PRO Consortium • Initially PRO to address proteins Cathy Wu Barry Smith Judith Blake • Extended to protein complexes Peter D’Eustachio Carol Bult

  5. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  6. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  7. PRO within OBO Foundry • Ontology for semantic integration of heterogeneous biological data • OBO Foundry establishes rules and best practices to create a suite of orthogonal interoperable reference ontologies Molecule (PRO)

  8. PRotein Ontology (PRO) Ontology to formally represent proteins and protein complexes • Ontology for Protein Evolution (ProEvo) • captures the protein classes reflecting evolutionary relationships at full-length protein levels. • Ontology for Protein Forms (ProForm) • captures the different protein forms of a specific gene arising from genetic variations, alternative splicing, cleavage, post-translational modifications. Protein A Protein A unmodified A A P Protein A phospho 1 Protein A phospho 2 A P A Protein A cleaved P • Ontology for protein complexes (ProComp) • formally defines the protein complexes in terms of the specific components. Complex Z A P has_part Protein A phospho 1 B has_part Protein B Complex Z

  9. Why PRO? • Allow specification of relationships between PRO and other ontologies, such as GO and Disease Ontology • Provides a structure to support formal, computer-based inferences based on shared attributes among homologous proteins • Provides a stable unique identifier to any protein type • Provides formalization and precise annotation of specific protein forms/ classes, allowing accurate and consistent data mapping, integration and analysis

  10. ProEvo

  11. Gene Duplication Ontology for Protein Evolution (PROEvo) Speciation m SMAD5 h MH1 domain (PF03165) MH2 domain (PF03166) m SMAD1 BMP h m SMAD9 h h SMAD3 TGFB m h SMAD2 m Smad m SMAD6 I-Smads h m SMAD7 h h Co-Smad SMAD4 In the ontology... m captures the protein classes reflecting evolutionary relationships at full-length protein levels. Family: a PRO term at this level refers to proteins that can trace back to a common ancestor over the entire length of the protein are part of the same family. Gene: a PRO term at this level refers to the protein products of a distinct gene.

  12. Functional classes are not described by PRO (for now) • Cathepsins • proteases, most of the them become activated at the low pH found in lysosomes. • Crystallins • constituent of eye lens. • Heat shock protein • class of functionally related proteins whose expression is increased when cells are exposed to elevated temperatures or other stress. • Protein Kinase • an enzyme that modifies other proteins by chemically adding phosphate groups.

  13. ProForm A A P A P A P

  14. The Need for Representation of Various Proteins Forms Function Association Localization Modification Disease

  15. PROEvo PROForm SMAD5 BMP SMAD1 SMAD9 Isoform 1 unmodified SMAD3 Isoform 1 TGFB Isoform 1 modified (PTM/Cleaved) SMAD2 Smad Isoform 2 unmodified Isoform 2 I-Smads Isoform 2 modified (PTM/Cleaved) Co-Smad capture different protein forms of a specific gene arising from genetic variations, alternative splicing, cleavage, post-translational modifications Sequence: a PRO term at this level refers to the protein products with a distinct sequence upon initial translation. Modification: aPRO term at this level refers to the protein products with some change that occurs after initial translation.

  16. In the ontology

  17. ProComp A P B

  18. Complexes that differ in subunit composition (within or in different species). E.g. mCD14/LPS vs sCD14/LPS Soluble CD14 memb CD14 Complex where subunits are post-translationally modified E.g.Phosphorylated IRF3 dimer Figure adapted from http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166016& In the ontology

  19. LAP TGF-b TGF-b TGF-b II II I I STRAP Smad 7 Shc Smad 2 Smad 2 Smad 2 Smad 2 Smad 2 Smad 2 S S S S S S S S S X S S S S S S S S S S S S S S S S X S P P P P P P P P P P P P P P P P P P P P P P P P P P P P TAK1 Y T Y T K Y Y Y T Y T Y T T K Y T P P P P P P P P P P U P P P P U P Smad 4 Smad 4 Smad 4 Smad 2 Phosphorylation (P) at Serine (S), Threonine (T) Tyrosine (Y) Ubiquitination (U) at Lysine (K) TGF-beta signaling – comparison between PID and Reactome PRO:000000397 Furin PRO:000000618 Growth signals Ca2+ Growth signals Stress signals PRO:000000616 TGF-beta receptor PRO:000000523 PRO:000000410 Cytoplasm Smad 2 PRO:000000468 PRO:000000650 MEKK1 PRO:000000481 Smad 4 ERK1/2 PRO:000000366 Shc XIAP TAK1 CaM PRO:000000650 PRO:000000651 PRO:000000366 Degradation P38 MAPK pathway JNK cascade MAPKKK PRO:000000652 X PRO:000000650 Ski PRO:000000366 Nucleus Common in both Reactome & PID X Only included in Reactome * All others are in PID. Not all components in the pathway from both databases are listed DNA binding and transcription regulation

  20. Framework

  21. Categories/ Levels of Distinction • Family:a PRO term at this level refers to proteins that can trace back to a common ancestor over the entire length of the protein are part of the same family. • Gene: a PRO term at this level refers to the protein products of a distinct gene. • Sequence:a PRO term at this level refers to the protein products with a distinct sequence upon initial translation. • Modification: aPRO term at this level refers to the protein products with some change that occurs after initial translation.

  22. Categories in ProEvo & ProForm PROEvo PROForm Gene product Family SMAD5 BMP SMAD1 Sequence Modification SMAD9 Protein Isoform 1 unmodified SMAD3 Isoform 1 TGFB Isoform 1 modified (PTM/Cleaved) SMAD2 Smad Isoform 2 unmodified Isoform 2 I-Smads Isoform 2 modified (PTM/Cleaved) Co-Smad Organism-gene Organism-sequence Organism-modification

  23. Protein ontology framework PAF.txt pro.obo annotation ProEvo, ProForm ProComp

  24. Some concepts • Ortho-isoform:These are isoforms- encoded by orthologous genes that are believed to have arisen prior to speciation and divergence of the primary sequence. • Ortho-modified form:Post-translational modifications on equivalent residues in ortho-isoforms. PRO:000000048 TGF-beta receptor type-2 isoform 1 (Also known as Isoform RII-1)  PRO:000000615 GTP-binding protein RhoAisoform 1 prenylated 1  A GTP-binding protein RhoAisoform 1 prenylated form where a geranylgeranyl moiety has been added to the Cys residue within the C-terminal Cxxx motif. Example: UniProtKB:P61586-1, has_modification MOD:00113 S-geranylgeranyl-L-cysteine, Cys-190. [PMID:16773203, PRO:CNA]    

  25. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  26. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  27. What is curated in the ontology? • Name:take name from data source but we follow established naming guidelines. • Synonyms:imported from data source, add others by request and during curation. • Definitions:we try to create standard definitions. A is a B that C’s Both text and logical definitions are created when possible PRO:000025762 serine palmitoyltransferase complex A (mouse) def: "A serine palmitoyltransferase complex that is heterotrimeric and whose components are encoded in the genome of mouse." [PRO:CJB] is_a: GO:0017059 ! serine C-palmitoyltransferase complex relationship: has_part PRO:000025361 {cardinality="1"} ! serine palmitoyltransferase 1 (mouse) relationship: has_part PRO:000025362 {cardinality="1"} ! serine palmitoyltransferase 2 (mouse) relationship: has_part PRO:000025363 {cardinality="1"} ! serine palmitoyltransferase 3 (mouse) relationship: only_in_taxon taxon:10090 ! Musmusculus • Cross-ref:when there is a database object that corresponds to the term.

  28. Example of how we define modified forms Feature section of UniProtKB record for human smad2 Reference section Ser-465/Ser-467 Ser-240/Ser-465/Ser-467

  29. [Term] id: PRO:000000650 name: smad2 isoform 1 phosphorylated 1 def: "A smad2 isoform 1 phosphorylated form that has been phosphorylated in the last two Ser residues within the SSxS C-terminal motif by TGF-beta pathway activation." [PMID:8980228, PMID:9346966] comment: Category=modification. synonym: "TGF-beta receptor-activated smad2" RELATED [] is_a: PRO:000000574 ! smad2 isoform 1 phosphorylated form ProForm Curation by TGF-beta receptor Ser-465/Ser-467 through Ca++-mediated signaling Ser-240/Ser-465/Ser-467 [Term] id: PRO:000000652 name: smad2 isoform 1 phosphorylated 3 def: "A smad2 isoform 1 phosphorylated form that has been phosphorylated at a [S/T] residue within the MH1-MH2 domain linker region in response to decorin-induced Ca(2+) signaling. This form is also phosphorylated in the last two Ser residues within the SSxS C-terminal motif." [PMID:11027280] comment: Category=modification. is_a: PRO:000000574 ! smad2 isoform 1 phosphorylated form

  30. What is Annotated? • Domain, especially ProEvo level: • GO terms • PSI-MOD terms for protein modifications • SO for sequence variants • MIM for sequence variants

  31. PRO homepage http://pir.georgetown.edu/pro/pro.shtml

  32. PRO distribution files ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo/ pro.obo Ontology in OBO format

  33. PRO distribution files ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo/PAF_guidelines.pdf NOT contributes_to decreased increased altered PAF.txt PRO association file (Tab delimited, similar to GAF) Column Column Title Description 1 PRO_ID PRO identifier, mandatory 2 Object_term Name of the PRO term 3 Object_synonym Other names by which the described object is known 4 Modifier Flags that modify the interpretation of an annotation 5 Relation Relation to the corresponding annotation. 6 Ontology_ID ID for the corresponding annotation. 7 Ontology_term Term name for the corresponding ontology ID. 8 Relative_to Modifiers increased, decreased and altered require an entry in this column to indicate what the change is relative to. 9 Interaction_with To indicate binding partner. 10 Evidence_sourcePubmed ID or database source for the evidence. 11 Evidence_code Same as evidence code for GO annotations 12 TaxonTaxon identifier for the species that the annotation is extracted from. 13 Inferred_from Use only for evidence code: IPI and ISS for PRO. 14 DB_ID One or more unique identifiers for a single source cited as an authority for the attribution of the ontology term. 15 Protein_region To indicate part of the protein sequence. 16 Modiresidue(s), MOD_ID To indicate the residue(s) that has a post-translational modification and the type of modification. 17 Date Date on which the annotation was made. 18 Assigned_by The database which made the annotation. 19 Equivalent forms List the equivalent form in other organisms. 20 Comments Curator comments, free text. part_of located_in has_part has_agent has_function participates_in agent_in has_modification

  34. PRO distribution files ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo/PAF_guidelines.pdf PAF.txt PRO association file (Tab delimited, similar to GAF)

  35. PRO scope and statistics Current scope: human, mouse and E. coli proteins. Comprehensive coverage for gene level and general modified forms • In PRO Release 13.0, version 0 (link): • There are 25700 PRO terms • # terms Category • 281 family • 18184 gene • 128 organism-gene • 1153 sequence • 5662 modification • 13 complex • Annotation • Curated papers: 929 • Annotation to GO terms:   2006 • Annotation to MOD terms:   361 

  36. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  37. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  38. PRO Entry http://purl.obolibrary.org/obo/PRO_000000563 1-Ontology 2-Features 3-Mapping 4-Annotation

  39. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  40. Outline • Introduction to the PRotein Ontology • Consortium • Framework • Curation • PRO Website • PRO Entry • Search and Browse • Annotation

  41. Browse 3-Information tabs 1-Number terms 2-Sorting 4-Add text to find

  42. Browse (cont) ‘Find’ is exact text match to a term Information tabs Add text to search Use information tab to display information of interest

  43. QuickBrowse Quick Browse allows to browse terms related to a given theme: -Terms for proteins with a given modification -Terms for saliva biomarkers -Terms with link to a given database -Terms with orthoisoforms

  44. Quick Browse:Orthoisoforms Quick Browse:Saliva biomarkers

  45. PRO homepage

  46. PRO Advanced Search • Boolean searches: AND, OR, NOT • Null/not null search Examples for search fields in: http://pir.georgetown.edu/pro/searchPRO.pdf

  47. -All terms derived from a given gene Link to family database. Proteins in this class are from vertebrates

  48. Search for PRO terms that are modified forms that are annotated with GO term nucleus http://pir.georgetown.edu/cgi-bin/pro/textsearch_pro • Booleansearches: AND, OR, NOT Save results as tab delimited file Indicate level in the hierarchy Show the selected terms in the hierarchy

  49. Click apply to see the new column(s) Use Display Option to add/remove columns > Annotation column added

More Related