1 / 12

mzTab

mzTab. Proposal for A Simple Data Format for Proteomics Results. Current Situation. The necessity of standard data formats has become generally accepted Proteomics techniques are constantly evolving Proposed standard formats had to become very complex to adequately capture proteomics data

breena
Download Presentation

mzTab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. mzTab Proposal for A Simple Data Format for Proteomics Results

  2. Current Situation • The necessity of standard data formats has become generally accepted • Proteomics techniques are constantly evolving • Proposed standard formats had to become very complex to adequately capture proteomics data • mzIdentML for identification data • mzQuantML for quantitative data • An effective use of these data formats requires sophisticated bioinformatic knowledge • Many researchers are still used to use MS Excel to “look” at their data

  3. Communication of Proteomics Results • Proteomics resources require a mechanism to simply/efficiently exchange basic proteomics results • Collaboration with colleagues from other scientific fields is increasingly important • Necessity to share proteomics results with researchers outside of proteomics • Need to make proteomics data easily accessible

  4. Potential Current Problems • Currently proposed standard formats are difficult to use without the JAVA APIs • “Complete” standard formats are too complex and big to quickly share the essential results • Quick, f.e. Perl scripts for specific research questions are not easily possible • Large amount of potential innovation could be lost • Reading files requires special software • Further processing of the data (f.e. with statistical) tools is not easily possible • No standard tools to read / write mz*ML files available • Custom built software required for many use cases otherwise fulfilled by “Excel & friends”

  5. mzTab - Aim • To provide a simple and efficient way of exchanging proteomics data • Which protein / peptide was identified in a given experimental setting • Easy to update and maintain • Easy to use by the proteomics community, systems biologists as well as providers of knowledge bases

  6. mzTab – Target Audience • Proteomics repositories (f.e. PRIDE, PeptideAtlas) • Knowledge base resources (f.e. UniProt, HPRD) • Researchers outside of proteomics • Researchers analyzing proteomics data with limited bioinformatic knowledge / support

  7. mzTab – proposed concept • A tab-delimited file format • Goals • Content should be “readable” using MS Excel • Should contain minimal information for proteomics repositories / knowledge bases to exchange data • Data should be easily accessible using f.e. scripting languages • One file should be able to contain multiple experiments / proteins from different resources • Aim: To represent the result of a query to f.e. PRIDE using this format • Provide a simplisitic summary of proteomics results • Every entry contains a reference to the source data (in mzIdentML / mzQuantML format)

  8. mzTab – proposed concept • What the format does NOT aim at: • Replace mzIdentML or mzQuantML • Contain the complete data of a proteomics experiment • Provide detailed evidence for the data • Allow a researcher to recreate the process which led to the results • Be requirements conform (MIAPE, journal guidelines, etc.) • In short: be complete in any way

  9. mzTab – Possible Format Specification • Three sections • (Optional) Metdata section • (Required) Protein section • (Optional) Peptide section • Can report proteomics data at different levels • Single experiments • Multiple (possibly linked) experiments • Data generated as a result to a query (possibly to multiple resources)

  10. mzTab – Metadata Section ----metadata PRIDE_16649-title: The Synaptic Proteome during Development and Plasticity of the Mouse Visual Cortex PRIDE_16649-species: [NEWT, 10090, Mouse,] PRIDE_16649-tissue: [EFO, EFO:0000916, visual cortex,] PRIDE_16649-instrument[1]-type: [MS, MS:1000287, TOF-MS,] PRIDE_16649-search_engine: [MS, MS:1001207, Mascot, ] PRIDE_16649-contact[1]-name: August B Smit PRIDE_16649-contact[1]-email: guus.smit@cncr.vu.nl PRIDE_16649-url: http://www.ebi.ac.uk/pride/q.do?accession=16649 ----END

  11. mzTab – Protein Section ----proteins Accession … reliability peptides … ambiguity_members P12345 4 2 P12346,P123457 … ´----END • A Table holding the basic identification information • Suggestions of how to include • quantitative data • multiple search engine scores • ambiguous modification positions

  12. mzTab – Peptide Table ----peptides sequence accession unit unique … reliability … DIIL O00160 PRIDE_3381 false 5 … VESVDL O00160 PRIDE_3381 true 4 … ----END • A Table holding the basic peptide information • Suggestions of how to include • quantitative data • multiple search engine scores • ambiguous modification positions

More Related