1 / 60

Mass Spec Proteomics HUPO-PSI & PRIDE

Mass Spec Proteomics HUPO-PSI & PRIDE. Phil Jones (pjones@ebi.ac.uk) Proteomics Services Group www.ebi.ac.uk. Positioning – The Technologies in Question. protein extraction. complex protein mixture. http://www.akh-wien.ac.at/biomed-research/htx/platweb1.htm. 2D-PAGE separation.

tasmine
Download Presentation

Mass Spec Proteomics HUPO-PSI & PRIDE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mass Spec ProteomicsHUPO-PSI & PRIDE Phil Jones (pjones@ebi.ac.uk) Proteomics Services Group www.ebi.ac.uk

  2. Positioning – The Technologies in Question

  3. protein extraction complex protein mixture http://www.akh-wien.ac.at/biomed-research/htx/platweb1.htm 2D-PAGE separation MS/MS analysis pI fragmentation MS analysis tryptic digest MW Classic: 2D PAGE proteomics

  4. protein extraction complex protein mixture enzymatic digest http://www.akh-wien.ac.at/biomed-research/htx/platweb1.htm Data-dependent MS/MS analyses extremely complex peptide mixture separation selection MS analysis less complex peptide fractions New: peptide-centric identification (shotgun strategy)

  5. Public Standards for Proteomics:HUPO Proteomics Standards Initiative

  6. Mission: Develop minimal reporting guidelines Data representation standards (often XML formats) Annotation standards (ontology and controlled vocabularies) Involve data produces, hardware vendors, database providers, software producers, publishers The HUPO Proteomics Standards Initiative http://psidev.info

  7. Four documents make up each individual standard Formal requirements specification Minimal reporting requirements => MIAPE document XML Data exchange format Domain-specific controlled vocabulary What constitutes a PSI standard?

  8. MIAPE / MIMIx Guidelines

  9. MIAPE: Minimum Information About a Proteomics Experiment MIMIx: Minimum Information about a Molecular Interaction eXperiment Understand, qualify and reproduce Requirements to be enforced by journals, repositories, funders Compatibility with the PSI data formats MIAPE & MIMIx

  10. It is: A checklist of information and data to provide when an experiment is reported (it is a content descriptor) An aid to assessing quality control Number of replicates, expected error rate It is not: A description of the way to run an experiment A describing of HOW to represent data Use excel to create a table with these five following columns:… A guide to quality judgment What is a MIAPE / MIMIx document

  11. XML Data Exchange Formats

  12. mzData Mass spectrometry data mzML Replacement for mzData (since June 2008) analysisXML Mass spec. search engine output PSI-MI Molecular interactions (PPI) GelML Results of gel electrophoresis experiments GelInfoMLGel image analysis, manipulation and quantitation spML GC, LC, centrifugation, capillary electrophoresis etc. Available XML Exchange Formats

  13. mzData 1.05 Established 4 years ago All major MS vendors generate mzData All major search engines consume mzData Data repositories accept mzData as input Commercial applications are built on mzData mzML 1.0.0 Completed document process on 1 June, 2008 Developed as a collaboration between PSI and ISB PSI-MS Working group chaired by Eric Deutsch (ISB) Supports Merges best features of PSI’s mzData and ISB’s mzXML “We encourage the community to begin implementing mzML 1.0.0 [and] to phase out use of mzData and mzXML” PSI – Mass Spectrometry Data InterchangemzData -> mzML

  14. mzData  mzML: beyond the deliverable PSI ISB mzXML mzData PepXML mzIdent ProtXML –+ + – + + mzML analysisXML

  15. Details of mzML

  16. Details of mzML: run

  17. Details of mzML: cvParam and userParam

  18. Details of mzML: spectrum

  19. Details of mzML: chromatogram

  20. Will become a common format for mass spectrometry search engine output Provides support for multi-step analyses Merges previous efforts of HUPO-PSI with ISB PSI – Mass Spectrometry Data InterchangeanalysisXML (Protein / Peptide Identifications)

  21. All interchange standards map to external CVs CVs used to keep standards flexible and up to date – XML frozen for as long as possible CVs assist in keeping curation consistent and database searching effective All CVs maintained in OBO format and published on the Open Biomedical Ontologies website (http://www.obofoundry.org/) Controlled Vocabularies

  22. PSI-MS Mass spectrometry data MI Molecular Interactions PSI-MOD Protein modifications (PTMs) sepCVSample processing and separations controlled vocabulary PI “Proteomics Informatics” CV (accompanies analysisXML) The four in bold are current and available from the OBO Foundry & the Ontology Lookup Service http://obofoundry.org/ http://www.ebi.ac.uk/ols Available PSI Controlled Vocabularies

  23. PRIDE: The Proteomics Identifications Database

  24. The origin: availability versus accessibility Proteomics data is only made available as arbitrarily formatted PDF tables, carrying important limitations: • Source data (mass spectra) are not made available • No peer review validation possible • Very little raw materials for testing innovative in silico techniques are available • Automated (re-)processing of the identifications is impossible

  25. Sample generation Origin of sample hypothesis, organism, environment, preparation, paper citations • Sample processing, gel informatics Gels (1D/2D), columns, ‘chips’, other methods images, gel type and ranges, band/spot coordinates, quantitation stationary and mobile phases, flow rate, temperature, fractionation • Mass Spectrometry  ‘mzData’ machine type, ion source, voltages • Mass Spectrometry Informatics peak lists, database name + version, partial sequence, search parameters, search hits, accession numbers, quantitation • Data dissemination and Comparison PRIDE peak lists, protein and peptide identifications, post-translational modifications Science Supported by PRIDE

  26. Data In PRIDE Current Statistics: • 831,764 Protein Identifications • 4,947,353 Peptide Identifications (479,014 unique) • 7,409,854 Mass spectra Large Public Datasets: • HUPO Plasma Proteome Project • HUPO Brain Proteome Project (including mass spectra) • HUPO Liver Proteome Project (including mass spectra) • Human Cerebrospinal Fluid (U Washington School of Medicine). • Cellzome data set

  27. Apache Licence, Version 2.0 DAS Distributed Annotation Service Data Ownership Remains with Submitter 84% Public 16% Private PRIDE Overview Data Submission Presentation Proteome Harvest Excel Data Submission Spreadsheet Direct XML Submission Using the PRIDE Core API Human Curation (Creation of XML in house) WEB Data Exchange API & Persistence mzData XML Peak Lists (MS), Instrumentation, Sample. PRIDE XML Identifications of Proteins, Peptides, PTMs CORE

  28. Project * Experiment * Protocol <<mzData>> Sample Species Tissue Disease state Cellular component Developmental stage Protein Identifications <<mzData>> Instrumentation & Associated Software * 1..* * Ordered Steps Peptide Identifications 0..1 * * <<mzData>> Mass Spectra 0..1 * Protein Modifications (PTMs) A simplified schema of the PRIDE data store + group-based access control system; reviewer access

  29. THE LOOK OF PRIDE

  30. PRIDE web interface – overview

  31. PRIDE web interface – experiment and protein

  32. PRIDE web interface – mass spectra

  33. PRIDE web interface – project comparison

  34. PRIDE BioMart A Leap Forward in Query Capability

  35. BioMart (http://www.biomart.org) A query-oriented data management system. Developed by the EBI and CSHL Powered by BioMart software: • Central Server • Ensembl • HapMap • Dictybase • UniProt • Reactome • Array Express • Wormbase • Gramene • GermOnLine • DroSpeGe • PRIDE

  36. BioMart and PRIDE • Perform powerful and fast queries across large, complex data sets: • specify simple or complex filters involving multiple attributes of the data; • specify precisely which attributes or ‘columns’ of data are included in the output; • specify the format of the output, including: • HTML table (with links) • Excel spreadsheet • Tab-delimited file • Comma separated format

  37. Typical BioMart Usage Step 1 (Dataset): Choose your dataset Step 2 (Filters): Restrict your query Step 3 (Attributes): Specify what information you want to include in the output Step 4 (Results): Preview (including a simple count) and output or download the results in your chosen format.

  38. Typical BioMart Usage Step 1 (Dataset): Choose your dataset Step 2 (Filters): Restrict your query Step 3 (Attributes): Specify what information you want to include in the output Step 4 (Results): Preview (including a simple count) and output or download the results in your chosen format.

  39. PRIDE BioMart – Dataset Page

  40. Typical BioMart Usage Step 1 (Dataset): Choose your dataset Step 2 (Filters): Restrict your query Step 3 (Attributes): Specify what information you want to include in the output Step 4 (Results): Preview (including a simple count) and output or download the results in your chosen format.

  41. PRIDE BioMart – Defining a Complex Filter

  42. Typical BioMart Usage Step 1 (Dataset): Choose your dataset Step 2 (Filters): Restrict your query Step 3 (Attributes): Specify what information you want to include in the output Step 4 (Results): Preview (including a simple count) and output or download the results in your chosen format.

  43. PRIDE BioMart – Selecting Output Fields

  44. Typical BioMart Usage Step 1 (Dataset): Choose your dataset Step 2 (Filters): Restrict your query Step 3 (Attributes): Specify what information you want to include in the output Step 4 (Results): Preview (including a simple count) and output or download the results in your chosen format.

  45. PRIDE BioMart – Retrieving Results

  46. PRIDE BioMart – Output to Microsoft Excel

  47. The Ontology Lookup Service:Intelligent Query for PRIDE and Beyond…

  48. Ontologies – more than just a list of terms • A vocabulary of terms (names for concepts) • use stable identifiers for each concept • Definitions • Authoritative and unambiguous meaning for each concept and the context in which it should be used. • Defined logical relationships between terms • More complexity than a simple hierarchy. Child terms can be related to more than one parent and parent terms can have multiple children. Relationships themselves carry a significance.

  49. http://www.ebi.ac.uk/ontology-lookup/ What is OLS? • A unified, single point of query for over 54 ontologies (updated daily) and upwards of 530,000 terms. • A tool that offers online and programmatic access to query ontologies about: • Term names • Synonyms • Relationships • Annotations • Cross-references • Reusable code components to integrate such functionality in other projects

  50. The Use of Controlled Vocabulariesand Ontologies in PRIDE Require controlled vocabularies / ontologies are used to define the search space: • Species: Newt / NCBI Taxonomy ID • Tissue / organ / cell type: BRENDA Tissue ontology, Cell Type ontology • Sub-cellular component: GO • Disease: Human Disease: DOID • Genotype: GO • Sample Processing: PSI Ontology • Mass Spectrometry: PSI-MS Ontology • Protein Modifications: PSI-MOD Ontology • Terms that fit nowhere else!? - PRIDE CV OBO Ontologies

More Related