1 / 42

UniProt and Apoptosis

UniProt and Apoptosis. Sandra Orchard EMBL-EBI. What do Protein scientists require?. 1. A high quality protein sequence database

Download Presentation

UniProt and Apoptosis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UniProt and Apoptosis Sandra Orchard EMBL-EBI Master headline

  2. What do Protein scientists require? 1. A high quality protein sequence database A high quality, non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs. Sequence archiving essential. It is not appropriate to use a nucleotide sequence database as a source of protein sequences. 2. Protein Identification Stable identifiers and consistent nomenclature 3. Protein annotation Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source Master headline

  3. UniProt What is UniProt? Based on the original work on PIR, Swiss-Prot and TrEMBL Funded mainly by NIH to be the highest quality, most thoroughly annotated protein sequence database Collaboration between EBI, SIB and PIR Master headline

  4. UniRef 50 UniRef 90 IPI Proteome Sets UniRef 100 UniSave UniProtKB UniMes UniParc PDB Sub/ Peptide Data FlyBase WormBase Patent Data INSDC (incl. WGS, Env.) RefSeq Ensembl VEGA Database sources UniProt data sources and data flow Master headline

  5. UniProtKB • UniProt Knowledgebase: • 2 sections • UniProtKB/Swiss-Prot Non-redundant, high-quality manual annotation - reviewed • UniProtKB/TrEMBL Redundant, automatically annotated - unreviewed www.uniprot.org Master headline

  6. What does UniProtKB give you? • Curated protein sequences – correction of frameshifts, premature stop sites, incorrect initiator methionine…….. stable identifiers, with archiving and versioning • Identification of splice variants and/or alternative promoter usage - stable identifiers, with archiving and versioning • Identification of variants (at amino acid level) and of PTMs – where known, consequence is given - stable identifiers, with archiving and versioning Master headline

  7. What does UniProtKB give you? 4.Consistent nomenclature – plus synonyms 5. Annotation of literature experimental data in 27 defined fields. Increasing use of controlled vocabularies, without loss of detail 6. Extensive cross-referencing, a central portal to a wealth of external resources – 85 external databases cross-referenced to UniProtKB Master headline

  8. The New Website www.beta.uniprot.org Master headline

  9. 1. Sequence curation, stable identifiers, versioning and archiving Master headline

  10. Sequence curation, stable identifiers, versioning and archiving • For example – erroneous gene model predictions, frameshifts • …. ..premature stop codons, read-throughs, erroneous initiator methionines….. Master headline

  11. 2. Identification of splice variants Master headline

  12. 3. Identification of variants (at amino acid level)…. … and also Master headline

  13. …and of PTMs.. Master headline

  14. .. And of Binding sites Master headline

  15. 4. Consistent nomenclature (& synonyms) Master headline

  16. Master headline

  17. 5. Annotation of literature experimental data in 27 defined fields. Controlled vocabularies used whenever possible… Master headline

  18. Binary interactions taken from the IntAct database Master headline

  19. Master headline

  20. Disease specific annotation added to human entries… … with supporting cross-referencing Master headline

  21. Master headline

  22. UniProt Keywords UniProtKB entries are tagged with keywords that can be used to retrieve particular subsets of entries. 10 categories Biological process – Apoptosis, Cellular component Coding sequence diversity Developmental stageDiseaseDomain LigandMolecular function – Oncogene, Anti-oncogene. Post-translation modification Technical term The document keywlist.txt lists all the keywords and a definition of their usage in UniProtKB. Master headline

  23. Source references included in entry Master headline

  24. 6. Extensive cross-referencing, a central portal to a wealth of external resources… Master headline

  25. .. Additional annotation (Gene Ontology).. Master headline

  26. InterPro – defines protein family membership and enables domain annotation Master headline

  27. Master headline

  28. Annotation of entries in UniProtKB/Swiss-Prot Master headline

  29. Annotation of human entries in UniProtKB/Swiss-Prot Master headline

  30. UniProtKB/TrEMBL • Redundant – only 100% identical sequences merged • Automated clean-up of annotation from original nucleotide sequence entry • Additional value added by using automatic annotation Master headline

  31. Automatic Annotation • Recognises common annotation belonging to a closely related family within UniProtKB/Swiss-Prot • Identifies all members of this family using pattern/motif/HMMs in InterPro • Transfers common annotation to related family members in TrEMBL Master headline

  32. InterPro Master headline

  33. INTERPRO 1) Extract conditions from InterPro 2) Group Swiss-Prot entries by conditions Swiss-Prot TrEMBL 4) Group TrEMBL by conditions and add common annotation to TrEMBL entries Automated annotation in TrEMBL 3) Extract common annotation Automatic Annotation Master headline

  34. Master headline

  35. www.ebi.ac.uk/integr8 Complete Proteomes Master headline

  36. Proteome set download Master headline

  37. Non-redundant proteome sets Master headline Complete experimentally determined protein sets not yet available for higher organisms Require inclusion of predicted proteins to give full proteome International Protein Index (IPI) merges data from UniProt, Ensembl and Ref-Seq to produce non-redundant dataset

  38. International Protein Index Master headline Non-redundant protein sets produced for human, mouse, rat, Arabidopsis, zebrafish, cow and chicken effectively maintains a database of cross references between the primary data sources provides minimally redundant yet maximally complete sets of proteins for featured species (one sequence per transcript) maintains stable identifiers (with incremental versioning) to allow the tracking of sequences in IPI

  39. IPI Master headline

  40. User Input • Feedback – if you find something wrong, outdated, missing… • Be thorough when writing your papers – make protein identification clear, use accession numbers etc. • Submit, Submit, Submit Master headline

  41. With thanks to… The Sequence Database group – EBI UniProt collaborators – SIB, PIR InterPro consortium IntAct consortium GO consortium PRIDE HUPO-PSI Rolf Apweiler Master headline

  42. Master headline

More Related