uniprot the universal protein resource n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
UniProt - The Universal Protein Resource PowerPoint Presentation
Download Presentation
UniProt - The Universal Protein Resource

Loading in 2 Seconds...

play fullscreen
1 / 31

UniProt - The Universal Protein Resource - PowerPoint PPT Presentation


  • 262 Views
  • Updated on

UniProt - The Universal Protein Resource. Claire O’Donovan. Pre-UniProt . Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI;

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

UniProt - The Universal Protein Resource


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. UniProt - The Universal Protein Resource Claire O’Donovan

    2. Pre-UniProt • Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; • TrEMBL: created at the EBI in November 1996 as a computer-annotated protein sequence database supplementing Swiss-Prot. It was introduced to deal with the increased data flow from genome projects.

    3. The UniProt timeline • Awarded to EBI, SIB, and PIR by NIH • Run time 9/02-8/05 • ~16 million USD intended to replace Swiss-Prot license fees and previous PIR funding

    4. UniProt Consortium

    5. UniProt Consortium activities

    6. The three-layered approach • The UniProt Archive (UniParc) • UniProtKB + all other protein sequences publicly available • Completeness • The UniProt Reference Clusters (UniRef) • Non-redundant views of UniProtKB + selected UniParc sets • Speed • The UniProt Knowledgebase (UniProtKB) • Central database of annotated protein sequences and functional information • UniProtKB/Swiss-Prot + UniProtKB/TrEMBL

    7. The three layer approach Interrelationship between the UniProt Databases

    8. UniProt Archive • UniParc is a non-redundant archive of protein sequences from the public databases • It contains only protein sequences (no annotations) • It provides cross-references to the source databases

    9. UniProt Archive: Principles • UniParc is non-redundant • Each unique protein sequence is stored only once and is assigned a unique stable UniParc identifier (e.g UPI0000000356) • UniParc provides cross-references to the original source: active or retired • UniParc provides sequence versions.

    10. UniProt Reference ClustersPrinciples • It provides non-redundant reference data collections • It allows faster and more informative sequence similarity searches • It includes the UniProtKB and some data from UniParc • It merges across different species

    11. UniProt Reference ClustersPrinciples • UniRef100 • It merges identical sequences and subfragments • UniRef90 • Size reduction of 40% • UniRef50 • Size reduction of 65%

    12. UniProtKB/Swiss-Prot - Non-redundant - High level of integration - High level of manual curation - Contains 241,242 entries UniProtKB/TrEMBL - Translations of CDS in EMBL/GenBank/DDBJ - Automatic annotation - Contains 3,313,265 entries

    13. UniProtKB/TrEMBL • Automatically generated in a biweekly cycle from the data present in EMBL/GenBank/DDBJ and some other sources such as TAIR/SGD • Exclusions: pseudogenes, synthetic, immunoglobulins, patents, small sequences <8 • /product, /gene, /locus_tag • RefSeq and Ensembl

    14. UniProtKB/TrEMBL • Proteome annotation • Cross-references to other databases • Addition of relevant publications (eg PDB) • Redundancy • Automatic annotation • Future plans for manual annotation eg human proteome project

    15. Literature Other databases Analysis tools External expertise

    16. Capturing the correct sequence • Archive collections • Each sequence report stored in its own entry • Merging at 100% identity • Still some redundancy

    17. Sequence similarity searches • Identify potential merge candidates • Identify similar already curated entries

    18. Sequence comparison • Sequence alignments • Identification of sequence differences • Helps in identifying underlying causes

    19. Causes of sequence differences • Polymorphisms, disease variants • Splice variants • Sequencing errors • Incorrect predictions

    20. Literature curation • 1741 different journals cited in Swiss-Prot • Total of 383,401 references • Average of 2 references per entry

    21. Sequence analysis • Range of sequence analysis tools used to predict important sequence features • Use of most appropriate programs • Development of new predictive methods

    22. Evidence attribution • System which allows linking of all information in an entry to its original source. • Allows users: • to trace origin of all data • to differentiate easily between literature-derived and computational data • to assess data reliability

    23. UniProtKB curation group 14 curators 24 curators 2 curators

    24. EBI curation projects • Submissions • Journal scanning • Species-specific curation • human, mouse, rat, C.elegans, Drosophila, Xenopus, zebrafish, S.cerevisiae, S.pombe • Protein family curation • kinases, keratins • UniProtKB-MSD collaboration • PTM standardisation

    25. Some future curation plans • Improvements to SPIN • Extension of evidence attribution system to Swiss-Prot • New annotation projects • Community participation • Further database collaborations

    26. UniProt distribution • Biweekly distribution • Website access www.uniprot.org • FTP access • DVD of UniProtKB (datalib@ebi.ac.uk)

    27. UniProt Web

    28. The new UniProt grant timeline • Second Grant awarded to EBI, SIB, and PIR by NIH • Run time 9/06-8/09

    29. Acknowledgements (1) Production: Proteomes: Daniel Barrell Alan Horne Renato Golin Paul Kersey Alexander Fedetov Maria Jesus Martin Patricia Monteiro AutomaticAnnotation Claire O’Donovan /Kraken/Website/XML: Mark Rijnbeek Michael Kleen Ernst Kretschmann UniParc/UniSave: John O’Rourke Quan Lin Sam Patient Andrey Sitnov Emilio Salazar Rasko Leinonen Natalyia Skylar Dani Wieser

    30. EBI curators: Michele Magrane (Annotation coordinator / Mouse) Yasmin Alam (Keratins) Paul Browne (Journal scan) Wei Mun Chan (Human) Ruth Eberhardt (Submissions) Rebecca Foulger (Xenopus) Gill Fraser (Zebrafish) Gabriella Frigerio (Rat) John Garavelli (PTMs) Jules Jacobsen (Structural data) Kati Laiho (Fungi) Claire O’Donovan (Quality control, data integration) Sandra Orchard (Kinases) Eleanor Whitfield (C.elegans, Drosophila) SIB Group PIR Group Rolf Apweiler Acknowledgements (2)